Big Data Security Challenges Recommendations and Solutions

You might also like

You are on page 1of 13

301

Chapter 14
Big Data Security:
Challenges, Recommendations
and Solutions

Fatima-Zahra Benjelloun
Ibn Tofail University, Morocco

Ayoub Ait Lahcen


Ibn Tofail University, Morocco

ABSTRACT
The value of Big Data is now being recognized by many industries and governments. The efficient min-
ing of Big Data enables to improve the competitive advantage of companies and to add value for many
social and economic sectors. In fact, important projects with huge investments were launched by sev-
eral governments to extract the maximum benefit from Big Data. The private sector has also deployed
important efforts to maximize profits and optimize resources. However, Big Data sharing brings new
information security and privacy issues. Traditional technologies and methods are no longer appropri-
ate and lack of performance when applied in Big Data context. This chapter presents Big Data security
challenges and a state of the art in methods, mechanisms and solutions used to protect data-intensive
information systems.

INTRODUCTION (data are growing and changing in a rapid way),


Variety (data come in different and multiple
The value of Big Data is now being recognized formats) and Volume (huge amount of data is
by many industries and governments. In fact, the generated every second) (Wu, Zhu, Wu, & Ding,
efficient mining of Big Data enables to improve 2014). According to (Berman, 2013) these three
the competitive advantage and to add value for characteristics must coexist to confirm that a
many sectors (economic, social, medical, scientific source is a Big Data source. If one of these three Vs
research and so on). does not apply, we cannot discuss about Big Data.
Big Data is mainly defined by its 3Vs funda- (Berman, 2013) and (Katal, Wazid, & Goudar,
mental characteristics. The 3Vs include Velocity 2013) indicate that more Vs and other character-

DOI: 10.4018/978-1-4666-8387-7.ch014

Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Big Data Security

istics have been added by some Big Data actors This chapter presents first these challenges
to better define it: Vision (the defined purpose of in detail. Then, it discusses various solutions
Big Data mining), Verification (processed data and recommendations proposed to protect data-
comply to some specifications), Validation (the intensive information systems.
purpose is fulfilled), Value (pertinent information
can be extracted for the benefit of many sectors),
Complexity (it is difficult to organize and analyse SECURITY CHALLENGES
Big data because of evolving data relationships) IN BIG DATA CONTEXT
and Immutability (collected and stored Big Data
can be permanent if well managed). As mentioned by (Kim, Kim, & Chung, 2013),
Beside this, some argue when defining Big security in Big Data context includes three main
Data, that any huge amount of digital data sets that aspects: information security, security monitoring
we can no longer collect and process adequately, and data security. For (Lu et al., 2013), managing
through the existing infrastructures and technolo- security in a distributed environment means to
gies, are by nature Big Data. ensure Big Data management, system integrity
In this chapter, we are interested in security and cyberspace security.
challenges faced in Big Data context. We present Generally, Big Data security aims to ensure
also a state of the art in several methods, mecha- a real-time monitoring to detect vulnerabilities,
nisms and solutions used to protect information security threats and abnormal behaviours; a granu-
systems that handle large data sets. lar role-based access control; a robust protection
Big Data security has many common points of confidential information and a generation of
with the security of traditional information systems security performance indicators. It supports rapid
(where data are structured). However, Big Data decision-making in a security incident case. The
security requires more powerful tools, appropriate following sections identify and explain a number
methods and advanced technologies for rapid data of challenges to achieve these goals.
analysis. It requires also a new security manage-
ment model that handles in parallel internal data Big Data Nature
(data produced by internal systems and processes
within an organization) and external data (e.g., Because of Big Data velocity and huge volumes,
data collected from other companies or external it is difficult to protect all data. Indeed, adding
web sites). Regarding those points, many ques- security layers may slow system performances
tions can be raised: i) How to manage and process and affect dynamic analysis. Thus, access con-
securely large, unstructured and heterogeneous trol and data protection are two “BIG” security
types of data sets? ii) How to integrate security problems (Kim et al., 2013). Furthermore, it is
mechanisms into distributed platforms while en- difficult to handle data classification and man-
suring a good performance level (e.g., efficient agement of large digital disparate sources. Even
storage, rapid processing and real-time analysis)? though that the cost by GB has diminished, Big
iii) How to analyse massive data streams without Data security requires important investments. In
compromising data confidentiality and privacy? addition to all that, Big Data is most of the time

302

Big Data Security

stored and transferred across multiple Clouds and flexible and easily scalable to simplify the inte-
distributed worldwide systems. Sharing data over gration of future technological evolutions and to
many networks increase security risks. handle applications requirements’ changes.
There is a need to find a balance between mul-
The Need to Share Information tiple security requirements, privacy obligations,
system performance and rapid dynamic analysis
In globalization context, business models have to on divers large data sets (data in motion or static,
face holistic competition across the world. Thus, private and public, local or shared, etc.).
enterprises need to build a sustainable advantage
through collaboration with many entities, data Inadequate Traditional Solutions
monetization, and appropriate dynamic data
sharing. Traditional Security techniques, such as some
For data sharing, digital ecosystems are based types of data encryption, slow the performance
on multiple heterogeneous platforms. Such eco- and are time-consuming in Big Data context.
systems aim to ensure real time data access for Furthermore, they are not efficient. Indeed, just
many partners, clients, providers and employees. small data partitions are processed for security
They rely on multiple connections with different purposes. So most of the time, security’s attacks
levels of securities. Data sharing associated to are detected after the spread of the damage (Lu et
advanced analytics techniques brings multiple al., 2013). Big Data platforms imply the manage-
security threats such us: discovering confidential ment of various applications and multiple parallel
information (e.g., process and method of produc- computations. Therefore, the performance is a key
tions) or illegal access to networks’ traffics. In fact, element for data sharing and real-time analysis in
by establishing relations between extracted data such environments.
from different sources, it is possible to identify
individuals in spite of data anonymization (e.g., by New Security Tools Lack of Maturity
using correlation attacks, arbitrary identification,
intended identification attacks, etc.). The combination of multiple technologies may
For instance, in health sector, massive amounts bring hidden risks that are most of the time not
of medical data are shared between hospitals evaluated or under-estimated. In addition, new
and pharmaceutical laboratories for research security tools lack maturity. So, Big Data platforms
and analysis purposes. Such sharing may affect may incorporate new security risk and vulner-
patient’s privacy even if all the medical records abilities that are not fully assessed.
are anonymized, by finding for instance correla- At the same time, data value is concentrated
tions between medical records and mutual health on various clusters and data centres. Those rich
insurances (Shin, Sahama, & Gajanayake, 2013). data mines are very attractive for commerce, gov-
ernments and industrials. They constitute a target
Multiple Security Requirements of several attacks and penetrations. Furthermore,
most of security risks come from employees,
In Big Data context, one challenge is to handle partners and end-point users (more than a third-
information security while managing massive and part). Hence, it is important to deploy advanced
rapid data streams. Thus, security tools should be security mechanisms to protect Big Data clusters

303

Big Data Security

(Ring, 2013; Jensen, 2013). Regarding this point, In addition, to extract reliable and complete
data owners have the responsibilities to set clear information from Big Data sources, the analysts
security clauses and policies to be respected by have to deal with incomplete and heterogeneous
outsources. data streams coming from different sources in
different formats. They have to filter data (e.g.,
Data Anonymization eliminating noises, errors, spams, and so on).
They have also to organize and contextualize data
To ensure data privacy and security, data anony- (e.g., adding geo-location data) before performing
mization should be achieved without affecting any analysis.
system performance (e.g., real-time analysis) or
data quality. However, traditional anonymization Compliance to Security Laws
techniques are based on several iterations and time Regulations and Policies
consuming computations. Several iterations may
affects data consistency and slow down system Private organizations and government agencies
performance specially when handling huge het- have to respect many security laws and industrial
erogeneous data sets. In addition, it is difficult to standards that aim to enhance the management of
process and analyse anonymized Big Data (they digital data security and to protect confidentiality
need costly computations). (e.g., delete personal data if no more used, data
protection through its life cycle, transactions
Compatibility with Big archiving for legal purposes, citizens’ right to
Data Technologies access and modify their data). However, some
ICTs may involve entities across many countries.
Some security techniques are incompatible with So enterprises have to deal with multiple laws and
commonly used Big Data technologies like Ma- regulations (Tankard, 2012).
pRecude paradigm. To ensure security and privacy Furthermore, Big Data analytics may be in
of Big Data, it is not enough just to choose powerful conflict with some privacy principles. For ex-
technologies and security mechanisms. It is also ample, analysts can correlate many different data
mandatory to verify their compatibility with the sets from different entities to reveal personal or
organization Big Data requirements and existing sensible data even with anonymization techniques.
infrastructure components (Zhao et al., 2014). Consequently, such analysis may enable to identify
individuals or to discover confidential information
Information Reliability and Quality (Alvaro et al., 2013).

The reliability of data analysis results depends Need of Big Data Experts
on data quality and integrity (Alvaro, Pratyusa,
& Sreeranga, 2013). Therefore, it is important to In the era of Big Data, data analysis is a key factor
verify Big Data sources authenticity and integrity to prevent and detect security incidents. However,
before analysing data. Since the huge volumes of several surveys confirm that a number of enter-
data sets are generated every second, it is difficult prises are not aware of the importance to recruit
to assess the authenticity and integrity of all vari- data scientists for advanced security analysis. In
ous data sources. fact, to ensure Big Data security, organizations

304

Big Data Security

should rely on a multi-disciplinary team with There exist various security models, mecha-
data scientists, mathematicians and security best nisms and solutions for Big Data. However, most
practices programmers (Constantine, 2014). of them are not well known or mature. Many
research projects are currently struggling to en-
Big Data Security on hance their performances (Mahmood & Afzal,
Social Networks 2013). In the following sections, we present some
important ones.
Huge amount of photos, videos, user’s comments
and clicks are generated from social networks Security Foundations for
(SNs). They are usually the first source of infor- Big Data Projects
mation for different entities (Sykora, Jackson,
O’Brien, & Elayan, 2013). For any Big Data project, it is important to con-
Big Data on SNs constitute a valuable mine for sider the strategic priorities related to security
governments to better manage national security and to establish clear organizational guidelines
risks. Indeed, some governments analyse SNs for choosing associated technologies (in term
Big Data in order to supervise public opinions of reliability, performance, maturity, scalability,
(e.g., voting intentions, emotions, feelings about overall cost including maintenance cost). It is
a project or an event). They can prevent terrorist also important to consider the constraints related
and security attacks and assess citizens’ satisfac- to the integration, the existing infrastructure, the
tion regarding public services. available and planned budget for Big Data security
In addition, dynamic analysis of SNs enables management.
crisis committees to optimize and ensure a rapid The goal is to ensure agility across all the
crisis management like in disasters cases. The security systems, solutions, processes and proce-
goal is to detect rapidly abnormal patterns and to dures. Organizational agility is important to enable
ensure a real-time monitoring of alarming events. organizations to face rapid changes in terms of
new security’s requirements: legal changes, new
partners and customers, environment and market’s
BIG DATA SECURITY SOLUTIONS changes, technological updates and innovations,
new security risks and so on.
Nowadays, with the spread of social networks, After the establishment of security values,
distributed systems, multiple connections, mobile strategies and management models, it is important
devices, the security of a Big Data information to deduce and establish clear security policies,
system become the responsibility of all actors guidelines, user agreements as well as security
(e.g., managers, security chiefs, auditors, end-users contractual clauses to respect when outsourcing
and customers). In fact, most of security threats Big Data services. All security strategies and poli-
come from inside users and employees. Thus, it cies should take in consideration many factors:
is convenient to raise the security awareness of First of all, it is essential to establish Big Data
all parties and to promote security best practices classification and management principles guided
of all the connected entities of the digital ecosys- by a long term vision. In fact, data classification
tem. It is not sufficient just to integrate security is mandatory to determine sensitive data and
technologies. The collaboration of all actors is valuable information to protect, to define data
required to eliminate the weak link of the system owners with their security policies, requirements
chain and to ensure compliance to security laws and responsibilities. Then, the security strategy
and policies. should be based on the assessment of security

305

Big Data Security

risks related to the different Big Data management Choosing Adequate


process (e.g., data generation, storage, transfer, ex- Security Solutions
change, access, deletion, modification and so on).
Finally, a security level has to be determined for To enhance Big Data security, organizations rely
each data category according to the organizational on advanced dynamic security analysis. The goal
strategy. In fact, (Bodei, Degano, Ferrari, Galletta, is to extract and analyse in real-time or nearly real-
& Mezzetti, 2012) recommends identifying data time security events and related users actions in
attributes to protect and to encrypt at the begin- order to enhance online and transactional security
ning of the system conception phase. and to prevent fraudulent attacks.
Furthermore, the organization has to keep Such dynamic analysis on the generated Big
track of legal changes to update the organizational Data helps to detect timely security incidents, to
policies and procedures. To ensure continuous identify abnormal customers’ behaviours, to moni-
legal compliance, it is important to involve the tor security threats, to discover known and new
legal department in the development of Big Data cyber-attack patterns and so on. Hence, dynamic
projects and in the upgrading of all the security analysis on Big Data enables an improved preven-
policies, including calendar duration of personal tion and rapid reactivity to take good decisions for
information, access permission, data transfer security. In parallel, the analysis of the generated
abroad, data access and exchange between many statics from applications and programs allows to
stakeholders with different security requirements, produce security performance indicators and to
integration of contractual security requirements, monitor and secure programs’ behaviours.
data conservation or destruction. (Kim et al., 2013) recommends protecting the
data values instead of the data themselves. In fact,
Risk Analysis Related to it is too difficult, and nearly impossible to protect
Multiple Technologies huge data sets. Furthermore, Big Data security
analysis techniques are based on the attributes’
It is important to study and assess security risks information. Thus, the data owners or operators
related to the mix of multiple technologies inside have the responsibility to define, select and protect
a Big Data platform. It is not sufficient to evaluate only important attributes that they consider valu-
security risks related to each used technology. In able for their use cases. To protect such important
fact, the integration of disparate technologies for data attributes in Big Data context, (Kim et al.,
multiple purposes may bring hidden risks and 2013) presents the following:
unknown security threats.
In addition, with the increasing spread of the • Evaluate the importance of the attributes,
Cloud and the BYOD (Bring Your Own Device), compare and evaluate the correlations be-
it important to consider security threats related tween them.
to the distributed environments and the use of • Filter and define the valuable attributes to
non-normalized mobile and personal devices for protect.
professional purposes. For this point, (Ring, 2013) • Choose security mechanisms to protect the
recommends to protect the multi-disparate end- relevant attributes according to the data
points with an extra security layer. Furthermore, owner or the organization policies.
the mobile devices should be normalized to fulfil
organizational and industrial security standards.

306

Big Data Security

Several analytical solutions are available to Sub-tree techniques are based on two methods:
secure Big Data such as Accenture, HP, IBM, Top-Down Specialization (TDS) and Bottom-Up
CISCO, Unisys, EADS security solutions. They Generalization (BUG). However, those methods
brings different level of performance and several are not scalable. There is a lack of performance
benefits like: enable Agile decision-making and when such methods are used for certain ano-
rapid reaction through real-time surveillance and nymization parameters. They cannot scale when
monitoring; detect dynamic attacks with enhanced applied to anonymize Big Data on distributed
reliability (low false-positive rate) thanks to the systems.
analysis of active and passive security informa- In order to improve the anonymization of valu-
tion; provide Full visibility of the network status able information extracted from large data sets,
and applications’ security problems. (Zhang et al., 2014) suggests a hybrid approach
(Mahmood & Afzal, 2013) identifies the de- that combines both anonymization techniques
ployment phases of Big Data Analytics solutions. TDS and BUG. This approach selects and applies
First of all, it recommends identifying strategic automatically one of the two techniques that are
priorities and Big Data security analysis goals. suitable for the use case parameters. Thus, this
Then, the organizational priorities should provide hybrid approach provides efficiency, performance
guideline to develop a more detailed strategy for and scalability required to anonymize huge data-
Big Data analysis platform’s deployment. The bases. It is supported by newly adapted programs
purpose is to optimize the selection of security to handle MapReduce paradigm. It enables to re-
solutions according to the strategic goals, the triple duce computation time in the distributed systems
constraints (overall cost, quality, duration of the or the Cloud.
implementation), the added value and the avail- Big Data processing and analysis are based
able features (performance, reliability, scalability most of the time on much iteration to have reliable
and so on). Regarding organizational strategy, and precise results, which may slow computa-
it recommends to adopt a centralized Big Data tions and the performance of security solutions.
management for better security outcomes. Regarding this, (Zhang et al., 2014) proposes
Before Big Data analysis, one pre-requisite is a method based on one-iteration for operations
to consider the integrity and authenticity of data generalization. The goal is to enhance parallelism
sources. Indeed, Big Data sources may contain capacities, the performance and the scalability of
errors, noises, incomplete data, or data without anonymization techniques.
context information. Consequently, it is essential Currently, many projects are working to de-
to filter and prepare data and to add context data velop new techniques and to improve existing ones
before applying analysis techniques. to protect privacy. As an example, some projects
are based on privacy preservation aware analysis
Anonymization of Confidential and scheduling techniques of large data set.
or Personal Data
Data Cryptography
Data anonymization is a recognized technique
used to protect data privacy across the Cloud Data Encryption is a common solution used to
and the distributed systems. Several models and ensure data and Big Data confidentiality. Many
solutions are used to implement this technique researches were conducted to improve the perfor-
such as: Sub-tree data anonymization, t-closeness, mance and the reliability of traditional techniques
m-invariance, k-anonymity and l-diversity. or to create new ways for Big Data encryption
techniques.

307

Big Data Security

Unlike some traditional techniques for encryp- Furthermore, it is essential to change the tra-
tion, Homomorphic Cryptography enables com- ditional governance concept where only security
putation even on encrypted data. Consequently, managers and chiefs, are accountable. It is more
this technique ensures information confidentiality convenient to adopt a centralized security gov-
while allowing extracting useful insight through ernance to meet the challenges of securing Big
some possible analysis and computations on the Data sources on distributed environments. The
encrypted data. organization should involve all the stakeholders
Regarding this solution, (Chen & Huang, 2013) connected to its ecosystem including employ-
proposes an adapted platform to handle MapRe- ees, managers, ISR, operators, users, customers,
duce computations in the case of Homomorphic partners, suppliers, outsources and so on. The
Cryptography. To ensure performance of the cryp- goal is to make all the parties accountable for
tographic solutions in distributed environments, security management to enhance the adoption of
(Liu et al., 2013) suggests a new approach for key security best practices and to ensure standard and
exchange called CBHKE (Cloud Background Hi- law compliance. Users should be aware threats,
erarchical Key Exchange). It is a secured solution regulations and policies.
that is more rapid than its predecessor techniques As an example, partners have to respect data
(IKE and CCBKE). It is based on an iterative access and confidentiality policies. Users have
strategy to an Authenticate Key Exchange (AKE) to update regularly their systems, to make sure
through two phases (layer by layer). However, new that their mobile devices respect standards and
approaches with enhanced performance are still recommended security practices and regulations.
needed to improve the encryption of large data Users have also to avoid installing non reliable
sets on distributed systems. components (e.g., counterfeit software, software
without a valid license). On the other hand, pro-
Centralized Security Management grammers, architectures and designers of Big
Data applications have to integrate security and
(Kasim, Hung, & Li, 2012) recommend storing privacy requirements though out all the develop-
data on the Cloud rather than mobile devices. The ment life cycle. The outsourcers should be made
goal is to take advantage of the normalized and accountable for Big Data security through clear
standard compliance infrastructure and central- security clauses.
ized security mechanisms of the Cloud. Indeed,
the Cloud platforms are regularly updated and Data Confidentiality and
continuously monitored for an enhanced security. Data Access Monitoring
However, “Zero risk” is hard to achieve. In
fact, data security relies on the hand of the Cloud There is an increasing spread of security threats
outsourcers and operators. In addition, the Cloud because of the increasing data exchange over
is very attractive for attackers as it is a centralized distributed systems and the Cloud. To face these
mine of valuable data. Data owners and managers security challenges, (Tankard, 2012) proposes
should be aware of the security risks and define to enhance the control by integrating controls at
clear data access policies. They have to ensure data level and during storage phase. In fact, it has
that the required security level is ensured when proved that controls at application and system
outsourcing Big Data management, storage or levels are not sufficient.
processing.

308

Big Data Security

In addition, access controls have to be well Authentication mechanisms are most of the
granulated to limit the access by role and respon- time, complex and heavy to handle across distrib-
sibilities. There exist many techniques to ensure uted clusters and large data sets. For this reason,
access control and data confidentiality such as (Zhao et al., 2014) suggests a model for security
ICP, certificates, smart-cards, federated identity on G-Hadoop that integrates several security solu-
management, multi-factors authentication. tions. It is based on Signe Sign On (SSO) concept
For example, Law Enforcement Agencies that simplifies users authentication and the compu-
(LEA) of USA have launched INDECT project in tation of MapReduce functions. Thus, this model
order to implement a secured infrastructure for a enables users to access different clusters with the
secured data exchange between agencies and other same account identifier. Furthermore, privacy is
members (Stoianov, Uruena, Niemiec, Machnik, protected through encrypted connections based on
& Maestro, 2013). The solution includes: SSL protocol, Public Key Cryptography and valid
certificates for authentication. Hence, this model
• A public Key Infrastructure (PKI) with offers an efficient access control and a protection
three levels (certification authority, users against hackers and attackers (e.g., MITM attacks,
and machines). The PKI provides access version rollback, delay attack) and deny access to
control based on a multi-factor authentica- fraudulent or untruthful entities.
tion and the security level required for each (Mansfield-Devine, 2012) recommends to
data type. For instance, access to highly involve not just security chiefs but also to make
confidential applications requires a valid responsible all end users for a better access control.
certificate and a password. It suggests also combining different types of con-
• A Federated Identity Management is a trols inside multi-silo environments (e.g., archives,
concept used by the INDECT platform to data loss prevention, access control, logs).
enhance access control and security. This
type of federated management is delegat- Security Surveillance and Monitoring
ed to an Identity Provider (IdP) within a
monitored trust domain. It is based on two It is important to ensure a continuous surveillance
security tools: certificates and smart-cards. in order to detect in real time security incidents,
Those tools are used to store user certifi- threats and abnormal behaviours. To ensure Big
cates issued by the PKI to encrypt and sign Data security surveillance, some solutions are
documents and emails. available such as: Data Loss Prevention (DLP),
• An INDECT Block Cipher IBC algorithm Security Information and Event Management
is a new algorithm for asymmetric cryp- (SIEM) and dynamic analysis of security events.
tography. It was developed and used to en- Such solutions are based on consolidation and cor-
crypt databases, communication sessions relations methods between multiple data sources,
(TLS-SSL) and VPN tunnels. The goal is to and on contextualization tools (to add context
ensure a high level of data confidentiality. as data attribute to the extracted data). It is also
• Secured communications based on VPN important to conduct regular audits and to verify
and TLS-SSL protocols. Those mecha- the respect of security policies and recommended
nisms are used to protect access to Big best practices by users and employees.
Data servers.

309

Big Data Security

EXAMPLE: SECURITY OF This complexity increases anomalies, human


SMART GRID BIG DATA errors and system vulnerabilities that can be
exploited by security attackers. As an example,
This section presents security challenges and the governor control system (GC) of a Smart
solutions regarding Big Dada processed in Smart Grid ensures the steady operation of all power
Grid infrastructures. generators. It detects by sensors the speed and
Giving the growing power demand, a Smart the frequency deviation of generators and rapidly
Grid gather and process huge data sets generated adjusts their operations. Attackers may succeed to
daily (e.g., consumers’ behaviours and habits) to access one point of network communications and
ensure an efficient and cost-effective production change values recorded by the generators sensors.
and distribution of electricity. Unlike traditional Thus, such type of communication intrusion and
electrical grid, Smart Grid is based on advanced malicious modification of data, could affect power
technologies and enables bidirectional power flow inside the grid and compromise the decision
flow between connected devices. However, Smart making process supervised by the Optimal Power
Grid infrastructures are vulnerable and face many Flow (OPF).
security threats. In fact, data are transferred mas- Moreover, attackers may use Grid utilities and
sively in this context and security attacks may have smart meters to get private information, which
serious and large scale impacts (e.g., regional or compromises consumer privacy (Stimmel, 2014).
national interruption in power supply, important Since Smart Grid system incorporates mul-
economic loss, disruption of public services, low tiple interconnected subsystems. There is a risk
service quality in hospitals, etc.). of cascading failures. In fact, any attack to one
Hence, securing Big Data of Smart Grid infra- of the subsystems may compromise the security
structures is fundamental to protect grid system of the other ones. Thus, it is crucial to secure the
performances, to ensure reliable coordination overall Smart Grid system, including the physical
between control centres and equipments, and to components, the cyber space and the data gener-
enhance the safety of all system operations. ated and transmitted through the grid networks.

Smart Grid Security Challenges Security Solutions for Smart Grid

Security challenges that are facing critical large Big Data security management in Smart Grid
scale infrastructures can be classified according to environment aims to ensure data confidential-
various parameters. (Pathan, 2014) recommends ity, integrity and availability for efficient and
a holistic multi-layered security approach to deal reliable grid operations. To increase Smart Grid
with this issue at all grid levels (i.e., physical, attacks-resilience and response to the previous
cybernetics and data). In fact, Smart Grid environ- cited challenges, several solutions and research
ment incorporates many distributed subsystems efforts have been made.
and end-points, including distributed sensors and Considering the cascading failure aspect and
actuators, ever-growing number of Intelligent the complexity of the Smart Grid system, it is
Electronic Devices (IED) with different power recommended to adopt multi-layered approach.
requirements (e.g., electric vehicles and smart This helps to secure not only data layer but also
homes), electric generators and control applica- all the other layers: physicals, networks, hosts,
tions. In addition, these subsystems and end-points data stores, applications, policies and regulations.
have multiple bidirectional communications For instance, at the physical layer, it is important
between each other. to ensure the security of equipments, consumer

310

Big Data Security

devices, substations, sensors, control centres and by legal actions and regulations in order to protect
so on. Concerning the cyber layer, it is recom- the valuable data and other assets. Therefore,
mended to secure network communications and security requirements, policies, regulations and
eliminate weak points from the cyber topology standards should be updated regularly to consider
(e.g., bad intrusion detection algorithms). Regard- evolving security issues.
ing the data layer, it is crucial to ensure granular
access control and granular audits (Alvaro A. C.,
Pratyusa K. M., & Sreeranga P. R., 2013). CONCLUSION
A successful security management of Smart
Grid system should incorporate real-time security Big Data applications promise interesting op-
monitoring. Big Data analytics algorithms are one portunities for many sectors. In fact, extracting
of the powerful solutions recommended to face valuable insight and information from disparate
such issue. Those algorithms drive improved pre- large data sources enables to improve the competi-
dictions and more precise analysis. They are often tive advantage of organizations. For instance, the
based on Machine Learning techniques and use analysis of data streams or archives (e.g., using
not only traditional security logs and events, but predictive or identification models) can help to
also performance and costumers’ data to recognize optimize production processes, to enhance services
and prevent malicious behaviours (Khorshed, Ali, with added value and to adapt them to customers’
& Wasimi, 2014). needs. However, Big Data sharing and analysis rise
Real-time monitoring can be part in mitiga- many security issues and increase privacy threats.
tion strategy to help security decision making. It This chapter presents some of the important Big
assists security analysts to decide if a preventive Data security challenges and describes related
or remedial action should be taken (e.g., change solutions and recommendations. Because it is
user roles or privileges, suspend suspicious ac- nearly impossible to secure very large data sets, it
cess, correct network configurations). The list is more practical to protect the data value and its
of actions depends on the nature of incident and key attributes instead of the data itself, to analyse
its impact. They can be implemented through security risks of combining different evolving Big
an automatically or semi-automatically process. Data technologies and to choose security tools
Ensuring continuous updates of such strategy is a according to the goals of the Big Data project.
good practice. This helps integrating new security
solutions, new laws as well as emerging security
practices and models. REFERENCES
Considering the complexity and evolving
nature of cybercrimes, it is important to promote Alvaro, A. C., Pratyusa, K. M., & Sreeranga, P.
continuous commitment and timely sharing of R. (2013). Big data analytics for security. IEEE
security information between all Smart Grid Security and Privacy, 11(6), 74–76. doi:10.1109/
partners, utilities, and specialized organizations MSP.2013.138
in cybercrimes (Bughin, J., Chui, M., & Manyika, Berman, J. J. (2013). Principles of big data:
J. 2010). Preparing, sharing, and analyzing complex infor-
It is well known that security techniques are not mation. San Francisco, CA: Morgan Kaufmann
sufficient in any industry. They have to be guided Publishers Inc.

311

Big Data Security

Bodei, C., Degano, P., Ferrari, G. L., Galletta, L., Khorshed, M. T., Ali, A. B., & Wasimi, S. A.
& Mezzetti, G. (2012). Formalising security in (2014). Combating Cyber Attacks in Cloud Sys-
ubiquitous and cloud scenarios. In A. Cortesi, N. tems Using Machine Learning. In S. Nepal &
Chaki, K. Saeed, & S. Wierzchon (Eds.), Computer M. Pathan (Eds.), Security, Privacy and Trust in
information systems and industrial management: Cloud Systems (pp. 407–431). Berlin, Germany:
Proceedings of the 11th IFIP TC 8 International Springer. doi:10.1007/978-3-642-38586-5_14
Conference (LNCS) (Vol. 7564, pp. 1-29). Berlin,
Kim, S. H., Kim, N. U., & Chung, T. M. (2013).
Germany: Springer. doi:10.1007/978-3-642-
Attribute relationship evaluation methodology for
33260-9_1
big data security. In Proceedings of the Interna-
Bughin, J., Chui, M., & Manyika, J. (2010). Clouds, tional Conference on IT Convergence and Security
big data, and smart assets: Ten tech-enabled (pp. 1-4). Macao, China: IEEE. doi:10.1109/
business trends to watch. Retrieved from http:// ICITCS.2013.6717808
www.mckinsey.com/insights/high_tech_tele-
Liu, C., Zhang, X., Liu, C., Yang, Y., Ranjan,
coms_internet/clouds_big_data_and_smart_as-
R., Georgakopoulos, D., & Chen, J. (2013). An
sets_ten_tech-enabled_business_trends_to_watch
iterative hierarchical key exchange scheme for
Chen, X., & Huang, Q. (2013). The data protection secure scheduling of big data applications in
of mapreduce using homomorphic encryption. In cloud computing. In Proceedings of the 12th IEEE
Proceedings of the 4th IEEE International Confer- International Conference on Trust, Security and
ence on Software Engineering and Service Science Privacy in Computing and Communications (pp.
(pp. 419-421). Beijing, China: IEEE. 9-16). Washington, DC: IEEE Computer Society.
doi:10.1109/TrustCom.2013.65
Constantine, C. (2014). Big data: An informa-
tion security context. Network Security, 2014(1), Lu, T., Guo, X., Xu, B., Zhao, L., Peng, Y., &
18–19. doi:10.1016/S1353-4858(14)70010-8 Yang, H. (2013). Next big thing in big data: The
security of the ict supply chain. In Proceedings of
Jensen, M. (2013). Challenges of privacy protec-
the International Conference on Social Computing
tion in big data analytics. In Proceedings of IEEE
(pp. 1066-1073). Washington, DC: IEEE Com-
International Congress on Big Data (pp. 235-
puter Society. doi:10.1109/SocialCom.2013.172
238). Washington, DC: IEEE Computer Society.
doi:10.1109/BigData.Congress.2013.39 Mahmood, T., & Afzal, U. (2013). Security analyt-
ics: Big data analytics for cybersecurity: A review
Kasim, H., Hung, T., & Li, X. (2012). Data value
of trends, techniques and tools. In Proceedings
chain as a service framework: For enabling data
of the 2nd National Conference on Information
handling, data security and data analysis in the
Assurance (pp. 129-134). Rawalpindi, Pakistan:
cloud. In Proceedings of the 18th IEEE Interna-
IEEE. doi:10.1109/NCIA.2013.6725337
tional Conference on Parallel and Distributed Sys-
tems (pp. 804-809). Washington, DC: IEEE Com- Mansfield-Devine, S. (2012). Using big data to
puter Society. doi:10.1109/ICPADS.2012.131 reduce security risks. Computer Fraud & Secu-
rity, (8): 3–4.
Katal, A., Wazid, M., & Goudar, R. (2013). Big
data: Issues, challenges, tools and good prac- Pathan, A. S. K. (2014). The state of the art in
tices. In Proceedings of the Sixth International intrusion prevention and detection. Boca Raton,
Conference on Contemporary Computing (pp. FL: Auerbach Publications. doi:10.1201/b16390
404-409). Noida, India: IEEE. doi:10.1109/
IC3.2013.6612229

312

Big Data Security

Ring, T. (2013). It’s megatrends: The security im- Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., &
pact. Network Security, 2013(7), 5–8. doi:10.1016/ Ranjan, R. et al. (2014). A security framework in
S1353-4858(13)70080-1 g-hadoop for big data computing across distrib-
uted cloud data centres. Journal of Computer and
Shin, D., Sahama, T., & Gajanayake, R. (2013).
System Sciences, 80(5), 994–1007.
Secured e-health data retrieval in daas and big data.
In Proceedings of the 15th IEEE international
conference on e-health networking, applications
services (pp. 255-259). Lisbon, Portugal: IEEE. KEY TERMS AND DEFINITIONS
doi:10.1109/HealthCom.2013.6720677
Anonymization: Anonymization is the process
Stimmel, C. L. (2014). Big data analytics strate- of protecting data privacy across information
gies for the smart grid. Boca Raton, FL: Auerbach systems. Several models and methods are used to
Publications. doi:10.1201/b17228 implement it such as: t-closeness, m-invariance,
Stoianov, N., Uruena, M., Niemiec, M., Machnik, k-anonymity and l-diversity.
P., & Maestro, G. (2012). Security Infrastructures: Authentication: Authentication aims to test
Towards the INDECT System Security. Paper and ensure, with a certain probability, that par-
presented at the 5th International Conference on ticular data are authentic, i.e., they have not been
Multimedia Communication Services & Security changed.
(MCSS), Krakow, Poland. doi:10.1007/978-3- Big Data: Big Data is mainly defined by its
642-30721-8_30 3Vs fundamental characteristics. The 3Vs include
Velocity (data are growing and changing in a
Sykora, M., Jackson, T., O’Brien, A., & Elayan, rapid way), Variety (data come in different and
S. (2013). National security and social media multiple formats) and Volume (huge amount of
monitoring: A presentation of the emotive and data is generated every second).
related systems. In Proceedings of the European Confidentiality: Confidentiality is a property
Intelligence and Security Informatics Confer- that ensures that data is not made disclosed to
ence (pp. 172-175). Uppsala, Sweden: IEEE. unauthorized persons. It enforces predefined rules
doi:10.1109/EISIC.2013.38 while accessing the protected data.
Tankard, C. (2012). Big data security. Network Encryption: Encryption relies on the use
Security, (7): 5–8. of encryption algorithms to transform data into
encrypted forms. The purpose is to make them
Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). unreadable to those who do not possess the en-
Data mining with big data. IEEE Transactions on cryption key(s).
Knowledge and Data Engineering, 26(1), 97–107. Privacy: Privacy is the ability of individuals
doi:10.1109/TKDE.2013.109 to seclude information about themselves. In other
Zhang, X., Liu, C., Nepal, S., Yang, C., Dou, W., words, they selectively control its dissemination.
& Chen, J. (2014). A hybrid approach for scal- Security Management: Security management
able sub-tree anonymization over big data using is a part of the overall management system of an
mapreduce on cloud. Journal of Computer and organization. It aims to handle, implement, moni-
System Sciences, 80(5), 1008–1020. doi:10.1016/j. tor, maintain, and enhance data security.
jcss.2014.02.007

313

You might also like