You are on page 1of 70

Over View of Security Issues in

Big data Analytics


1. M.J.Bharathi 2.Dr.V.N.Rajavarman
1. Research Scholar (Ph.D.) ,Dr M.G.R Educational and
Research Institute, University 
2.Professor & Head, CSE (Phase- II),
Dr M.G.R Educational and Research Institute, University 
Big Data

Big Data is a term which defines the hi-tech, high speed, high-volume, complex
and multivariate data to capture, store, distribute, manage and analyze the
information
Data Forms
• Structured: The entire data is organized in terms of
Entities -Relations or Classes, Attributes, Schema.
• Semi Structured: Data that is not completely
structured, but partially-In a Group, size & type of
same attributes may differ.
• Unstructured: Formats that cannot be easily
indexed-audio, video and image files
LITERATURE SURVEY

• Threats to privacy of information


• Data Interoperability and Information Security
• Information Security on Web Enabled Healthcare
Provision
• Information Security for Authorized Data Disclosure
Literature review done in IoT, Cloud
Computing Using Big data with Map reduce
• IOT- IoT infrastructure is best suited for integration,
collection, processing, transmission and delivery of context
information. It combines context model with event based
organisation of service
• Cloud computing- the practice of using a network of remote
servers hosted on the Internet to store, manage, and process
data, rather than a loca server or a personal computer.
• Mapreduce - Programming model and an associated
implementation for processing and generating big data sets
with a parallel, distributed algorithm on a cluster
IOT
Type Of Data Resource of Data

Machine sensors, smart meters ,wearable devices


Generated Data
Biometric Data genetics, , heart retainer prints, blood pressure, signature, rental
scan ,pulse, x-ray
Human Generated case notes, laboratory results, hospital admission records, discharge
Data summaries and electronic mails, electronic health record data
Transactional Data billing and healthcare claims

Behavioural Data social interactions through websites and/or social media sites like face
book, twitter
Epidemiological disease registries , health surveys ,statistical data
Data
Publication Data clinical research and medical reference materials
Cloud computing
The three types of cloud computing
Infrastructure as a Service (IaaS)
A third party hosts elements of infrastructure, such as hardware, software, servers,
and storage, also providing backup, security, and maintenance.
Software as a Service (SaaS)
Using the cloud, software such as an internet browser or application is able to
become a usable tool.
Platform as a Service (PaaS)
The branch of cloud computing that allows users to develop, run, and manage
applications, without having to get caught up in code, storage, infrastructure and so
on.
MapReduce
MapReduce is a programming model and an associated implementation
for processing and generating big data sets with a parallel, distributed
 algorithm on a cluster.
• "Map" step: Each worker node applies the "map()" function to the local
data, and writes the output to a temporary storage. A master node
ensures that only one copy of redundant input data is processed.
• "Shuffle" step: Worker nodes redistribute data based on the output keys
(produced by the "map()" function), such that all data belonging to one
key is located on the same worker node.
• "Reduce" step: Worker nodes now process each group of output data, per
key, in parallel.
Map reduce architecture
CHALLENGES OF BIG DATA
Analytics Architecture.
Statistical signicance
Distributed mining
Time evolving data
Compression
Visualization
Hidden Big Data
Privacy, security and trust
Data management and sharing
• Big Data Analysis Platforms And Tools [26] - Hadoop
And MapReduce, Gridgain, Hpcc, Storm
• Data Bases / Warhouses- Apache Hbase, MongoDB ,
Hypertable, Hive
• Business Intelligence- It provides insights from
various data collected from various sources.
• Data Mining- to derive the required information in an
understandable format from the available data set
• File Systems- Gluster, HDFS
• Programming Lanugages- Apache Pig , ECL
BIG DATA POTENTIALS
• The research focused on Productivity,
Competitiveness and growth .The evolution of global
financial market and the economic impact of
technology
– Marketing, Healthcare,Social Media, Automation,
Manufacturing Industries, Defence , Smart City
REASONS FOR SECURITY AND PRIVACY ISSUES AND
CHALLENGES IN BIG DATA

• Big data is now widely accessible


• Technologies lack
• Technologies breached both accidentally and
intentionally
SECURITY ISSUES AND CHALLENGES IN BIG
DATA
• Security in big data is magnified by the three V’s,
Volume, Variety and Velocity
• Big Data security challenges can be categorized
under 4 major categories
- Infrastructure Security
- Data Privacy
- Data Management
- Integrity and Reactive Security
- Real time security monitoring
Infrastructure Security

Secure Computation in Distributed programming


frameworks
- Map Reduce framework
• Major attack prevention measures
- Securing the mappers
- Securing the data in the presence of an untrusted
mapper
Breakdown the threat to mappers
• Malfunctioning compute worker nodes
• Infrastructure attacks
• Rogue Data Nodes
Security best practice for Non-relational Data
stores
• Transactional integrity
• LAX Authentication mechanisms
• Insufficient Authorization Mechanism
• Susceptibility to injection attacks
• Lack of consistency
• Insider attack
Scalable Privacy – Preserving Data Mining and Analytics

• Inside analysts - untrusted partner


• Privacy-preserving analytics

Cryptographically Enforced Data-Centric


Security
• Visibility of data in different entities can be
controlled by two major approaches
• Limiting access to the underlying system
• Encapsulating the data itself in a protective shell
using cryptography
Granular Access Control

• Perspective of access control is secrecy—


preventing
• Data managers a scalpel instead of a sword to
share data as much as possible without
compromising secrecy.
Secure Data Storage and Transactions
Logs

• Confidentiality and Integrity


• Availability
• Provenance
• Collusion Attacks
• Roll-Back Attacks
• Disputes
Granular Audits

• Moment an attack takes place

Integrity and Reactive Security

•End-Point Input Validation


•various threats for input validation
•An attacker might perform Identity cloning attack
•An attacker can alter the input by simulating a GPS
satellite
A Survey on Big Data Management in
Health Care Using IOT

Research Scholar Supervisor

M.J.Bharathi Dr V.N.Rajavarman
Dr M.G.R Educational and Professor & Head ,CSE (Phase –II)
Research University Dr M.G.R Educational and Research
University
Abstract
• Displays - application framework dependent
on the Internet of Things.
• Action incorporates - accessibility,capacity to
customize, and practical conveyance.
• Review - accouterment sensors, advanced
pretentious healthcare systems in IOT
Technologies
Introduction
• Utilization of different sensor gadgets and
innovations
• Human services suppliers from various
controls
Introduction
Introduction
• Huge information - crude information into
valuable huge data
• Data needed to be aggregated – variety source
• Divergent objects with inteligence – interact
and exchange data
Health Care System
 Quality of health depends on Health care

“services and delivers them in a comprehensive


and integrated way”

• Acute Care
• Community-Based Care
• Long-Term Care
Healthcare Networks issues

• Seamless incorporation
• Intermediary between the hardware application layers
• Integrates with medical device
Big data needs
• To turn data into actions
• To promote Preventive care
• To enhance Patient Satisfaction and engagement
• To Advanced Care Management
• To Advanced population health management
Healthcare Networks Issues

•Healthcare topology
•Healthcare Architecture
•Healthcare platform
Security of Big Data in Health Care

• Data Goverence
• Heterogeneity
• Real time Security analytics
• Disaster Recovery
• Analytics for Privacy Preservation
Health Care IOT Challenges
• Data Privacy
• Flexibility and evolution of application
• Data Integration
• Managing device interoperability and diversity
• Scale ,data volume and performance
Conclusion
• Evolving smart health care device is a realistic
way to manipulate existing healthcare.
• Increasing the awareness of evolving diseases
and implementation of government schemes
improves the quality of life.
Future Implementation
• To address the security problems in WBAN, a
secure data collection scheme for big data is
proposed
• Sensor nodes need to register in the big data
center to connect the Network using CA
• Authentication towards both sides using
ECDSA(Elliptic Curve Digital Signature Algorithm)
• Information is transferred under security
protection
References
1. https://www.ngdata.com/what-is-big-data-analytics
2. I.Olaronke, O.Oluwaseun, “Big data in healthcare:Prospects, challenges and resolutions” In
Future Technologies Conference (FTC), IEEE, pp. 1152-1157,December,2016.
3. R.Thomas-MacLean, D.Tarlier, S.Ackroyd-Stolarz,
M.Fortin, M.Stewart, “No cookie-cutter response:conceptualizing primary health care”,2014.
4. https://iot.ieee.org/images/files/pdf/networks-of-things_jeffvoas_5-31-2016.pdf
5. https://iotdunia.com/what-is-an-iot-platform
6. https://hitconsultant.net/2017/11/03/internet-things-digitalfuture-value-based-care/
References
7. C.Doukas, I.Maglogiannis, “Bringing IoT and cloud computing towards pervasive healthcare”, In Innovative
Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on .
IEEE,pp. 922-926, July, 2012.
8. S.R.Islam, D.Kwak, M.H.Kabir, M.Hossain, K.S.Kwak, “The internet of things for health care: a
comprehensive survey”, IEEE Access, 3, pp.678-708,2015..
9. K.Kavitha,, G.Suseendran, “A Review on Security Issues of IOT Based on Various Technologies”, Journal of
Advanced Research in Dynamical and Control Systems, Vol.10 (4), June, 2018.
10. H.K.Patil, R.Seshadri,"Big data security and privacy issues in healthcare." Big Data (BigData Congress),
2014 IEEE International Congress on. IEEE,June,2014.
11. http://triotree.com/blog/medical-internet-of-thingschallenges- benefits-applications/
A Secured Data Retrieval Architecture for WBAN using
Elliptic Curve Digital Signature

Research Scholar Supervisor

M.J.Bharathi Dr V.N.Rajavarman
Dr M.G.R Educational and Professor & Deputy Dean,
Research University Dr M.G.R Educational and Research
University
IMPLEMENTATION OF DIGITAL SIGNATURE
ALGORITHM USING BIG DATA SENSING
ENVIRONMENT

Research Scholar Supervisor

M.J.Bharathi Dr V.N.Rajavarman
Dr M.G.R Educational and Professor & Deputy Dean,
Research University Dr M.G.R Educational and Research
University
Abstract

I. Big data transmit the data through Map reduce and retrieve the data
safely using ECCDS algorithm
II. Map reduce -for accessing multiple data sets on multi-node hardware -
distributed storage process and it incorporate the entire key
III. Cloud Sim extensible toolkit is used to enable the modeling and to
enhance the application provision.
Introduction

WBAN Architecture
– Restorative and non-medicinal applications - IEEE 802.156 -utilizing the
moderate qualities Hadoop Environment
– Data rate, intrusion created by the coincidence of different technology in
the same position
– WBAN applications are resolved by acceptable radio technology.
WBAN ARCHITECTURE
Cloud Architecture
• Modeling and simulation of virtual cloud -data center environment
counterfeit
• Interface for memory , VMs, band width and storage
• Cloud registering structure -administration shoppers (SaaS suppliers),
expediting and suppliers
• Utility-driven between systems administration of mists application
provisioning and remaining task at hand entry
ARCHITECTURE OF CLOUD ENIVORNMENT
Map-reduce Framework

• Map-reduce is an encoding portrayal and an associated achievement for


managing out and create enormous datasets
• The Input value pairs mapped into a set of intermediate value pairs are
known as mappers
• These yield of mappers are organized and then divided by reducer
MAP -REDUCE FRAME WORK
Proposed Architecture
– utilize existing contraption and innovation to inspect and collect, a
tremendous amount of information isn't sufficient since they can't
take out fundamental representation informational collections
– execute a structural proposition for dissecting both secluded path in
disconnected and real occasion.
ARCHITECTURE OF REAL TIME BIG DATASENSING
REAL-TIME BIG DATA SENSING
• Real Big data Sensing Architecture contains three significant bits;
collection, treasury server results away server(s) and regulatory server
• The Hadoop processed the partial outputs produced by aggregation and
compilation server and it indicates the outcomes are prepare to compile,
although the aggregated outcomes may be not in structured and compiled
form
• composite the related results and arrange them into an actual form for
future processing and to load them
Elliptical Curve Cryptography Based Big-Data Retrieval Unit
• ECDSA Key Generation composed of two pairs( x,y),
• x is declared as private key , assume to be integer
• public key y, is an elliptic curve point,
• where as private key x compute the public key y.
Session key Generation in Encryption
• To verify correct and incorrect signature for each mod size selected,
• the ECDSA creates a key pair (p,q) of which the private key p is used to
indication number of pseudorandom messages of 1024 bits.
• The ITU expressed the messages, signatures, domain parameters
• and public key q values using signature verification
• and then attempts to verify the signatures
• and returns the results to the ECDSA, which compares the received results
with its stored results.
Key Generation algorithm

Algorithm key-generation
{
//problem description : to calculate key pair
//Input: Domain Parameter x(q,FR,a,b,g,n,k)
E-Eliptic curve created over Fq
P-Point of Prime numberE(Fq)
Y-Eliptic curve point
//Output : To Generate Key pair(Public,Private Key)
Select a random number d in the interval {1, m-1}
Calculate Y=xy
Y- Public key x- Private key
}
Session key Generation in Encryption

Algorithm Transmit (ZP, ID, PU)


{
Initialize Flag for destination;
Hash(IP,IDr,PU);
Receiver send HDR,SA,PUr,Flag,Nr;
return Flag r ;
}
Algorithm Certify (Flagr, PUr)
{
Initialize Flagrfrom receiver to Sender;
CertifyFlagr using Key Generation;
Generate Key pairs for authentication;
}
Algorithm Encryption(Flagr, key p, key q)
{
Mutually certify key agreement validate Flagi Generate (Qp, Qq);
Ek(Nn,||Nj||Pr);
return Flagi
}
Result

• The execution time of Proposed ECC is less think about Novel ECC and key
approach traits based encryption.
• Security of ECC is high contrast with and key approach traits based
encryption since encryption depends on security characteristics.
• User gets to benefits and classification is accomplished by common
confirmation.
• User mystery key accountably accomplished by ECC based encryption
secure the key clients.
• Information unhesitatingly is accomplished through trust based property
based Utilizing the novel arrangement
Comparison Graph
• ECDSA method is used for generating a session key for acceptance
and verification.
• During the execution, it generates the one time nonce without
replication.
• Decryption algorithms to be defined in map-reduce function and
the performance analysis is to be followed with a Novel ECC
algorithm in the proposed ECC algorithm.
• User secret key accountably and data confidentiality is to be
achieved by Nonce based encryption to protect the key users.
A Secured Data Retrieval Architecture for WBAN using Elliptic
Curve Digital Signature

Abstract

I. The big data is a secure scheme of data collection that to deal the
problems in WBAN
II. To register the sensor nodes using CA (Certification Authority) connect the
network of Big data center
III. The sensors are correlated with big data center through authentication on
both sides by ECDSA
IV. The sensor node designed using distributed storage and the collected data
transfer with improved security protection.
Introduction

• WBAN – Input
• Big data Analytical
– Hadoop Environment
– Hadoop Clustering
– Hdfs(Files stored in HDFS)
– Proposed ECC Protocol
System Architecture

Big Data architecture split into three parts


1) Cloud Data Processing (CDP)
2) Big-data Storage Unit(BSU)
3) Health Care Big data Retrieval Unit(HCBRU).
Big data Architecture
Cloud Big-data Processing (CDP)

• Efficient data analyzes- Remote sensing satellite preprocessing data


governed by many situations to include the data from various sources
• Reduces the storage unit- but also enhances analyses in accuracy -
accumulated information are directed into a ground station using
downlink channel
• The determined data, converting measure into two steps are processing
the big data in real time and offline.

 In offline, data center received the data from earth base station for
depository used for future estimate.
 In real time data processing, it reduce the processing time and directly
communicated to the filtration and load balancer server
Big-data Storage Unit(BSU)

• Data analysis- Information is identified by filtration and rests of them are


discarded.
• Filtered data into various processing
• Generated the real time results in each segments for compilation,
organization, and storing for further processing.
Health Care Big data Retrieval Unit (HCBRU)

HCBRU
– aggregation and compilation server,
– results storage server(s),
– decision-making server.
•To compile, organize, store and transmit the results by supporting
various algorithms during compilation
•The compiled results send the copy of the result to the decision
making server for taking the decision.
•The decision utilized by application to make their develop
Flowchart of the Big Data architecture
HADOOP ENVIRONMENT

• HADOOP is developed to perform huge machine but not to share in


memory
• Loading huge data into HADOOP -partition the data into pieces then it
spreads across different servers
• Central control node runs Name-Node to keep track of HDFS directories
• Files, and Job-Tracker to dispatch compute tasks to Task-Tracker
HADOOP CLUSTERING

• Hadoop Clustering implemented in JVM


• Job Tracker - Master Node Controlling the distribution of data
• Task Tracker - Slave Node Responsible for Scheduling Job
• Name Node Controlling HDFS , Responsible for fault tolerance,
Allocating any component to file access
• Data Node Part of the HDFS , holds the files
HDFS ( HADOOP Distributed File System)

• The Files are split into blocks and each Block divide across many machines
at load time
• Blocks are repeated across different machines and to track the file stored in
name Node.
Elliptic Curve Cryptography Certified protocol

HDR-Header ,SA-Security Association ,Flgr – flag for receiver ,


IP—Address ,PU-public key , ID—Device ID,IKE—Internet Key Exchange
Proposed Algorithm  
Step 1: Initiator → Responder: HDR, SA i  
The initiator sends the offered cryptographic solutions to the responder .
Step 2: Responder → Initiator: HDR, SA r , P U r  


The responder calculates a digest, F lg r = h(IP r , ID r , P U r ) select the
algorithm and send to public key-nonce prevents the replay attack- responder
not satisfied offered cryptography solution - send error msg
Step 3: Initiator → Responder: HDR, P U i , F lg i , E k x (N r ||N i )  
The initiator computes its own Flgr and compares it with the received value
mismatch the communication is terminated, otherwise the initiator calculates the
session key
• generates its own F lg i and transmits this value of digest, its public key,
• nonce of sender and receiver along with its IP encrypted  
• Hence responder is verified and replay is checked.
Step 4: Responder → Initiator: E k x (N r ||N i ||IP r )  
The responder verifies the digest of initiator F lg i ,
if it is true calculates the session key using public key of responder. It sends the
encrypted values to initiator for mutual authentication.
Result

In this proposed ECC protocol


• Initiator and responder decide their private key
• Generate corresponding public keys using prime field (nonce) for
implementation
• Initiator/Responder agree to calculate derived keys like authentication key,
encryption keys ,Integrity key from the session key elements
Conclusion and Future work

• Propose a protocol based on elliptic curve cryptography certified


algorithm to secure the efficient data communication with one time token
to assure the encryption algorithm efficiently and also to reduce the cost.
• In future work to enhance the new approaches to analysis the performance
of secured data.
Reference

1. Chunqiang Hu, Hongjuan Li, Xiuzhen Cheng and Xiaofeng Liao, “Secure and Efficient data communication protocol
for Wireless Body Area Networks”, IEEE Transactions On Multi-Scale Computing Systems, Vol. , No. , 11. 2015
2. Limin Ma · Yu Ge · Yuesheng Zhu, “ TinyZKP: A Lightweight Authentication Scheme Based on Zero-Knowledge
Proof for Wireless Body Area Networks”, Wireless Pers Commun (2014) 77:1077–1090
3. Amrita Roy Chowdhurya , Tanusree Chatterjeeb , Sipra DasBita,” LOCHA: A Light-Weight One-way
Cryptographic Hash Algorithm for Wireless Sensor Network”. The 5th International Conference on Ambient
Systems, Networks and Technologies (ANT-2014).
4. Kyung-Ah Shim, Young-Ran Lee ⇑ , Cheol-Min Park,” EIBAS: An efficient identity-based broadcast
authentication scheme in wireless sensor networks”, SciVerse ScienceDirect, Ad Hoc Networks 11 (2013) 182–
189
5. Sanskruti Patel and Atul Patel,” A Big Data Revolution In Health Care Sector: Opportunities, Challenges And
Technological Advancements”, International Journal of Information Sciences and Techniques (IJIST) Vol.6,
No.1/2, March 2016
6. Isabel de la Torre, Begoña García-Zapirain, Miguel López-Coronado,” Analysis of Security in Big Data Related
to Healthcare,” Journal of Digital Forensics, Security and Law Volume 12 | Number 3 Article 5
7. Muhammad Sheraz Arshad Malik, Muhammad Ahmed, Tahir Abdullah, Naila Kousar, Mehak Nigar Shumaila”
Wireless Body Area Network Security and Privacy Issue in E-Healthcare”, (IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 9, No. 4, 2018
8. Manikanthan, S.V., Padmapriya, T, A secured multi-level key management technique for intensified wireless
sensor network. International Journal of Recent Technology and Engineering, V.7, No.6S2, 2019

You might also like