Professional Documents
Culture Documents
Big Data Security Challenges Recommendations and Solutions
Big Data Security Challenges Recommendations and Solutions
Big Data Security Challenges Recommendations and Solutions
Chapter 14
Big Data Security:
Challenges, Recommendations
and Solutions
Fatima-Zahra Benjelloun
Ibn Tofail University, Morocco
ABSTRACT
The value of Big Data is now being recognized by many industries and governments. The efficient min-
ing of Big Data enables to improve the competitive advantage of companies and to add value for many
social and economic sectors. In fact, important projects with huge investments were launched by sev-
eral governments to extract the maximum benefit from Big Data. The private sector has also deployed
important efforts to maximize profits and optimize resources. However, Big Data sharing brings new
information security and privacy issues. Traditional technologies and methods are no longer appropri-
ate and lack of performance when applied in Big Data context. This chapter presents Big Data security
challenges and a state of the art in methods, mechanisms and solutions used to protect data-intensive
information systems.
DOI: 10.4018/978-1-4666-8387-7.ch014
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Big Data Security
istics have been added by some Big Data actors This chapter presents first these challenges
to better define it: Vision (the defined purpose of in detail. Then, it discusses various solutions
Big Data mining), Verification (processed data and recommendations proposed to protect data-
comply to some specifications), Validation (the intensive information systems.
purpose is fulfilled), Value (pertinent information
can be extracted for the benefit of many sectors),
Complexity (it is difficult to organize and analyse SECURITY CHALLENGES
Big data because of evolving data relationships) IN BIG DATA CONTEXT
and Immutability (collected and stored Big Data
can be permanent if well managed). As mentioned by (Kim, Kim, & Chung, 2013),
Beside this, some argue when defining Big security in Big Data context includes three main
Data, that any huge amount of digital data sets that aspects: information security, security monitoring
we can no longer collect and process adequately, and data security. For (Lu et al., 2013), managing
through the existing infrastructures and technolo- security in a distributed environment means to
gies, are by nature Big Data. ensure Big Data management, system integrity
In this chapter, we are interested in security and cyberspace security.
challenges faced in Big Data context. We present Generally, Big Data security aims to ensure
also a state of the art in several methods, mecha- a real-time monitoring to detect vulnerabilities,
nisms and solutions used to protect information security threats and abnormal behaviours; a granu-
systems that handle large data sets. lar role-based access control; a robust protection
Big Data security has many common points of confidential information and a generation of
with the security of traditional information systems security performance indicators. It supports rapid
(where data are structured). However, Big Data decision-making in a security incident case. The
security requires more powerful tools, appropriate following sections identify and explain a number
methods and advanced technologies for rapid data of challenges to achieve these goals.
analysis. It requires also a new security manage-
ment model that handles in parallel internal data Big Data Nature
(data produced by internal systems and processes
within an organization) and external data (e.g., Because of Big Data velocity and huge volumes,
data collected from other companies or external it is difficult to protect all data. Indeed, adding
web sites). Regarding those points, many ques- security layers may slow system performances
tions can be raised: i) How to manage and process and affect dynamic analysis. Thus, access con-
securely large, unstructured and heterogeneous trol and data protection are two “BIG” security
types of data sets? ii) How to integrate security problems (Kim et al., 2013). Furthermore, it is
mechanisms into distributed platforms while en- difficult to handle data classification and man-
suring a good performance level (e.g., efficient agement of large digital disparate sources. Even
storage, rapid processing and real-time analysis)? though that the cost by GB has diminished, Big
iii) How to analyse massive data streams without Data security requires important investments. In
compromising data confidentiality and privacy? addition to all that, Big Data is most of the time
302
Big Data Security
stored and transferred across multiple Clouds and flexible and easily scalable to simplify the inte-
distributed worldwide systems. Sharing data over gration of future technological evolutions and to
many networks increase security risks. handle applications requirements’ changes.
There is a need to find a balance between mul-
The Need to Share Information tiple security requirements, privacy obligations,
system performance and rapid dynamic analysis
In globalization context, business models have to on divers large data sets (data in motion or static,
face holistic competition across the world. Thus, private and public, local or shared, etc.).
enterprises need to build a sustainable advantage
through collaboration with many entities, data Inadequate Traditional Solutions
monetization, and appropriate dynamic data
sharing. Traditional Security techniques, such as some
For data sharing, digital ecosystems are based types of data encryption, slow the performance
on multiple heterogeneous platforms. Such eco- and are time-consuming in Big Data context.
systems aim to ensure real time data access for Furthermore, they are not efficient. Indeed, just
many partners, clients, providers and employees. small data partitions are processed for security
They rely on multiple connections with different purposes. So most of the time, security’s attacks
levels of securities. Data sharing associated to are detected after the spread of the damage (Lu et
advanced analytics techniques brings multiple al., 2013). Big Data platforms imply the manage-
security threats such us: discovering confidential ment of various applications and multiple parallel
information (e.g., process and method of produc- computations. Therefore, the performance is a key
tions) or illegal access to networks’ traffics. In fact, element for data sharing and real-time analysis in
by establishing relations between extracted data such environments.
from different sources, it is possible to identify
individuals in spite of data anonymization (e.g., by New Security Tools Lack of Maturity
using correlation attacks, arbitrary identification,
intended identification attacks, etc.). The combination of multiple technologies may
For instance, in health sector, massive amounts bring hidden risks that are most of the time not
of medical data are shared between hospitals evaluated or under-estimated. In addition, new
and pharmaceutical laboratories for research security tools lack maturity. So, Big Data platforms
and analysis purposes. Such sharing may affect may incorporate new security risk and vulner-
patient’s privacy even if all the medical records abilities that are not fully assessed.
are anonymized, by finding for instance correla- At the same time, data value is concentrated
tions between medical records and mutual health on various clusters and data centres. Those rich
insurances (Shin, Sahama, & Gajanayake, 2013). data mines are very attractive for commerce, gov-
ernments and industrials. They constitute a target
Multiple Security Requirements of several attacks and penetrations. Furthermore,
most of security risks come from employees,
In Big Data context, one challenge is to handle partners and end-point users (more than a third-
information security while managing massive and part). Hence, it is important to deploy advanced
rapid data streams. Thus, security tools should be security mechanisms to protect Big Data clusters
303
Big Data Security
(Ring, 2013; Jensen, 2013). Regarding this point, In addition, to extract reliable and complete
data owners have the responsibilities to set clear information from Big Data sources, the analysts
security clauses and policies to be respected by have to deal with incomplete and heterogeneous
outsources. data streams coming from different sources in
different formats. They have to filter data (e.g.,
Data Anonymization eliminating noises, errors, spams, and so on).
They have also to organize and contextualize data
To ensure data privacy and security, data anony- (e.g., adding geo-location data) before performing
mization should be achieved without affecting any analysis.
system performance (e.g., real-time analysis) or
data quality. However, traditional anonymization Compliance to Security Laws
techniques are based on several iterations and time Regulations and Policies
consuming computations. Several iterations may
affects data consistency and slow down system Private organizations and government agencies
performance specially when handling huge het- have to respect many security laws and industrial
erogeneous data sets. In addition, it is difficult to standards that aim to enhance the management of
process and analyse anonymized Big Data (they digital data security and to protect confidentiality
need costly computations). (e.g., delete personal data if no more used, data
protection through its life cycle, transactions
Compatibility with Big archiving for legal purposes, citizens’ right to
Data Technologies access and modify their data). However, some
ICTs may involve entities across many countries.
Some security techniques are incompatible with So enterprises have to deal with multiple laws and
commonly used Big Data technologies like Ma- regulations (Tankard, 2012).
pRecude paradigm. To ensure security and privacy Furthermore, Big Data analytics may be in
of Big Data, it is not enough just to choose powerful conflict with some privacy principles. For ex-
technologies and security mechanisms. It is also ample, analysts can correlate many different data
mandatory to verify their compatibility with the sets from different entities to reveal personal or
organization Big Data requirements and existing sensible data even with anonymization techniques.
infrastructure components (Zhao et al., 2014). Consequently, such analysis may enable to identify
individuals or to discover confidential information
Information Reliability and Quality (Alvaro et al., 2013).
The reliability of data analysis results depends Need of Big Data Experts
on data quality and integrity (Alvaro, Pratyusa,
& Sreeranga, 2013). Therefore, it is important to In the era of Big Data, data analysis is a key factor
verify Big Data sources authenticity and integrity to prevent and detect security incidents. However,
before analysing data. Since the huge volumes of several surveys confirm that a number of enter-
data sets are generated every second, it is difficult prises are not aware of the importance to recruit
to assess the authenticity and integrity of all vari- data scientists for advanced security analysis. In
ous data sources. fact, to ensure Big Data security, organizations
304
Big Data Security
should rely on a multi-disciplinary team with There exist various security models, mecha-
data scientists, mathematicians and security best nisms and solutions for Big Data. However, most
practices programmers (Constantine, 2014). of them are not well known or mature. Many
research projects are currently struggling to en-
Big Data Security on hance their performances (Mahmood & Afzal,
Social Networks 2013). In the following sections, we present some
important ones.
Huge amount of photos, videos, user’s comments
and clicks are generated from social networks Security Foundations for
(SNs). They are usually the first source of infor- Big Data Projects
mation for different entities (Sykora, Jackson,
O’Brien, & Elayan, 2013). For any Big Data project, it is important to con-
Big Data on SNs constitute a valuable mine for sider the strategic priorities related to security
governments to better manage national security and to establish clear organizational guidelines
risks. Indeed, some governments analyse SNs for choosing associated technologies (in term
Big Data in order to supervise public opinions of reliability, performance, maturity, scalability,
(e.g., voting intentions, emotions, feelings about overall cost including maintenance cost). It is
a project or an event). They can prevent terrorist also important to consider the constraints related
and security attacks and assess citizens’ satisfac- to the integration, the existing infrastructure, the
tion regarding public services. available and planned budget for Big Data security
In addition, dynamic analysis of SNs enables management.
crisis committees to optimize and ensure a rapid The goal is to ensure agility across all the
crisis management like in disasters cases. The security systems, solutions, processes and proce-
goal is to detect rapidly abnormal patterns and to dures. Organizational agility is important to enable
ensure a real-time monitoring of alarming events. organizations to face rapid changes in terms of
new security’s requirements: legal changes, new
partners and customers, environment and market’s
BIG DATA SECURITY SOLUTIONS changes, technological updates and innovations,
new security risks and so on.
Nowadays, with the spread of social networks, After the establishment of security values,
distributed systems, multiple connections, mobile strategies and management models, it is important
devices, the security of a Big Data information to deduce and establish clear security policies,
system become the responsibility of all actors guidelines, user agreements as well as security
(e.g., managers, security chiefs, auditors, end-users contractual clauses to respect when outsourcing
and customers). In fact, most of security threats Big Data services. All security strategies and poli-
come from inside users and employees. Thus, it cies should take in consideration many factors:
is convenient to raise the security awareness of First of all, it is essential to establish Big Data
all parties and to promote security best practices classification and management principles guided
of all the connected entities of the digital ecosys- by a long term vision. In fact, data classification
tem. It is not sufficient just to integrate security is mandatory to determine sensitive data and
technologies. The collaboration of all actors is valuable information to protect, to define data
required to eliminate the weak link of the system owners with their security policies, requirements
chain and to ensure compliance to security laws and responsibilities. Then, the security strategy
and policies. should be based on the assessment of security
305
Big Data Security
306
Big Data Security
Several analytical solutions are available to Sub-tree techniques are based on two methods:
secure Big Data such as Accenture, HP, IBM, Top-Down Specialization (TDS) and Bottom-Up
CISCO, Unisys, EADS security solutions. They Generalization (BUG). However, those methods
brings different level of performance and several are not scalable. There is a lack of performance
benefits like: enable Agile decision-making and when such methods are used for certain ano-
rapid reaction through real-time surveillance and nymization parameters. They cannot scale when
monitoring; detect dynamic attacks with enhanced applied to anonymize Big Data on distributed
reliability (low false-positive rate) thanks to the systems.
analysis of active and passive security informa- In order to improve the anonymization of valu-
tion; provide Full visibility of the network status able information extracted from large data sets,
and applications’ security problems. (Zhang et al., 2014) suggests a hybrid approach
(Mahmood & Afzal, 2013) identifies the de- that combines both anonymization techniques
ployment phases of Big Data Analytics solutions. TDS and BUG. This approach selects and applies
First of all, it recommends identifying strategic automatically one of the two techniques that are
priorities and Big Data security analysis goals. suitable for the use case parameters. Thus, this
Then, the organizational priorities should provide hybrid approach provides efficiency, performance
guideline to develop a more detailed strategy for and scalability required to anonymize huge data-
Big Data analysis platform’s deployment. The bases. It is supported by newly adapted programs
purpose is to optimize the selection of security to handle MapReduce paradigm. It enables to re-
solutions according to the strategic goals, the triple duce computation time in the distributed systems
constraints (overall cost, quality, duration of the or the Cloud.
implementation), the added value and the avail- Big Data processing and analysis are based
able features (performance, reliability, scalability most of the time on much iteration to have reliable
and so on). Regarding organizational strategy, and precise results, which may slow computa-
it recommends to adopt a centralized Big Data tions and the performance of security solutions.
management for better security outcomes. Regarding this, (Zhang et al., 2014) proposes
Before Big Data analysis, one pre-requisite is a method based on one-iteration for operations
to consider the integrity and authenticity of data generalization. The goal is to enhance parallelism
sources. Indeed, Big Data sources may contain capacities, the performance and the scalability of
errors, noises, incomplete data, or data without anonymization techniques.
context information. Consequently, it is essential Currently, many projects are working to de-
to filter and prepare data and to add context data velop new techniques and to improve existing ones
before applying analysis techniques. to protect privacy. As an example, some projects
are based on privacy preservation aware analysis
Anonymization of Confidential and scheduling techniques of large data set.
or Personal Data
Data Cryptography
Data anonymization is a recognized technique
used to protect data privacy across the Cloud Data Encryption is a common solution used to
and the distributed systems. Several models and ensure data and Big Data confidentiality. Many
solutions are used to implement this technique researches were conducted to improve the perfor-
such as: Sub-tree data anonymization, t-closeness, mance and the reliability of traditional techniques
m-invariance, k-anonymity and l-diversity. or to create new ways for Big Data encryption
techniques.
307
Big Data Security
Unlike some traditional techniques for encryp- Furthermore, it is essential to change the tra-
tion, Homomorphic Cryptography enables com- ditional governance concept where only security
putation even on encrypted data. Consequently, managers and chiefs, are accountable. It is more
this technique ensures information confidentiality convenient to adopt a centralized security gov-
while allowing extracting useful insight through ernance to meet the challenges of securing Big
some possible analysis and computations on the Data sources on distributed environments. The
encrypted data. organization should involve all the stakeholders
Regarding this solution, (Chen & Huang, 2013) connected to its ecosystem including employ-
proposes an adapted platform to handle MapRe- ees, managers, ISR, operators, users, customers,
duce computations in the case of Homomorphic partners, suppliers, outsources and so on. The
Cryptography. To ensure performance of the cryp- goal is to make all the parties accountable for
tographic solutions in distributed environments, security management to enhance the adoption of
(Liu et al., 2013) suggests a new approach for key security best practices and to ensure standard and
exchange called CBHKE (Cloud Background Hi- law compliance. Users should be aware threats,
erarchical Key Exchange). It is a secured solution regulations and policies.
that is more rapid than its predecessor techniques As an example, partners have to respect data
(IKE and CCBKE). It is based on an iterative access and confidentiality policies. Users have
strategy to an Authenticate Key Exchange (AKE) to update regularly their systems, to make sure
through two phases (layer by layer). However, new that their mobile devices respect standards and
approaches with enhanced performance are still recommended security practices and regulations.
needed to improve the encryption of large data Users have also to avoid installing non reliable
sets on distributed systems. components (e.g., counterfeit software, software
without a valid license). On the other hand, pro-
Centralized Security Management grammers, architectures and designers of Big
Data applications have to integrate security and
(Kasim, Hung, & Li, 2012) recommend storing privacy requirements though out all the develop-
data on the Cloud rather than mobile devices. The ment life cycle. The outsourcers should be made
goal is to take advantage of the normalized and accountable for Big Data security through clear
standard compliance infrastructure and central- security clauses.
ized security mechanisms of the Cloud. Indeed,
the Cloud platforms are regularly updated and Data Confidentiality and
continuously monitored for an enhanced security. Data Access Monitoring
However, “Zero risk” is hard to achieve. In
fact, data security relies on the hand of the Cloud There is an increasing spread of security threats
outsourcers and operators. In addition, the Cloud because of the increasing data exchange over
is very attractive for attackers as it is a centralized distributed systems and the Cloud. To face these
mine of valuable data. Data owners and managers security challenges, (Tankard, 2012) proposes
should be aware of the security risks and define to enhance the control by integrating controls at
clear data access policies. They have to ensure data level and during storage phase. In fact, it has
that the required security level is ensured when proved that controls at application and system
outsourcing Big Data management, storage or levels are not sufficient.
processing.
308
Big Data Security
In addition, access controls have to be well Authentication mechanisms are most of the
granulated to limit the access by role and respon- time, complex and heavy to handle across distrib-
sibilities. There exist many techniques to ensure uted clusters and large data sets. For this reason,
access control and data confidentiality such as (Zhao et al., 2014) suggests a model for security
ICP, certificates, smart-cards, federated identity on G-Hadoop that integrates several security solu-
management, multi-factors authentication. tions. It is based on Signe Sign On (SSO) concept
For example, Law Enforcement Agencies that simplifies users authentication and the compu-
(LEA) of USA have launched INDECT project in tation of MapReduce functions. Thus, this model
order to implement a secured infrastructure for a enables users to access different clusters with the
secured data exchange between agencies and other same account identifier. Furthermore, privacy is
members (Stoianov, Uruena, Niemiec, Machnik, protected through encrypted connections based on
& Maestro, 2013). The solution includes: SSL protocol, Public Key Cryptography and valid
certificates for authentication. Hence, this model
• A public Key Infrastructure (PKI) with offers an efficient access control and a protection
three levels (certification authority, users against hackers and attackers (e.g., MITM attacks,
and machines). The PKI provides access version rollback, delay attack) and deny access to
control based on a multi-factor authentica- fraudulent or untruthful entities.
tion and the security level required for each (Mansfield-Devine, 2012) recommends to
data type. For instance, access to highly involve not just security chiefs but also to make
confidential applications requires a valid responsible all end users for a better access control.
certificate and a password. It suggests also combining different types of con-
• A Federated Identity Management is a trols inside multi-silo environments (e.g., archives,
concept used by the INDECT platform to data loss prevention, access control, logs).
enhance access control and security. This
type of federated management is delegat- Security Surveillance and Monitoring
ed to an Identity Provider (IdP) within a
monitored trust domain. It is based on two It is important to ensure a continuous surveillance
security tools: certificates and smart-cards. in order to detect in real time security incidents,
Those tools are used to store user certifi- threats and abnormal behaviours. To ensure Big
cates issued by the PKI to encrypt and sign Data security surveillance, some solutions are
documents and emails. available such as: Data Loss Prevention (DLP),
• An INDECT Block Cipher IBC algorithm Security Information and Event Management
is a new algorithm for asymmetric cryp- (SIEM) and dynamic analysis of security events.
tography. It was developed and used to en- Such solutions are based on consolidation and cor-
crypt databases, communication sessions relations methods between multiple data sources,
(TLS-SSL) and VPN tunnels. The goal is to and on contextualization tools (to add context
ensure a high level of data confidentiality. as data attribute to the extracted data). It is also
• Secured communications based on VPN important to conduct regular audits and to verify
and TLS-SSL protocols. Those mecha- the respect of security policies and recommended
nisms are used to protect access to Big best practices by users and employees.
Data servers.
309
Big Data Security
Security challenges that are facing critical large Big Data security management in Smart Grid
scale infrastructures can be classified according to environment aims to ensure data confidential-
various parameters. (Pathan, 2014) recommends ity, integrity and availability for efficient and
a holistic multi-layered security approach to deal reliable grid operations. To increase Smart Grid
with this issue at all grid levels (i.e., physical, attacks-resilience and response to the previous
cybernetics and data). In fact, Smart Grid environ- cited challenges, several solutions and research
ment incorporates many distributed subsystems efforts have been made.
and end-points, including distributed sensors and Considering the cascading failure aspect and
actuators, ever-growing number of Intelligent the complexity of the Smart Grid system, it is
Electronic Devices (IED) with different power recommended to adopt multi-layered approach.
requirements (e.g., electric vehicles and smart This helps to secure not only data layer but also
homes), electric generators and control applica- all the other layers: physicals, networks, hosts,
tions. In addition, these subsystems and end-points data stores, applications, policies and regulations.
have multiple bidirectional communications For instance, at the physical layer, it is important
between each other. to ensure the security of equipments, consumer
310
Big Data Security
devices, substations, sensors, control centres and by legal actions and regulations in order to protect
so on. Concerning the cyber layer, it is recom- the valuable data and other assets. Therefore,
mended to secure network communications and security requirements, policies, regulations and
eliminate weak points from the cyber topology standards should be updated regularly to consider
(e.g., bad intrusion detection algorithms). Regard- evolving security issues.
ing the data layer, it is crucial to ensure granular
access control and granular audits (Alvaro A. C.,
Pratyusa K. M., & Sreeranga P. R., 2013). CONCLUSION
A successful security management of Smart
Grid system should incorporate real-time security Big Data applications promise interesting op-
monitoring. Big Data analytics algorithms are one portunities for many sectors. In fact, extracting
of the powerful solutions recommended to face valuable insight and information from disparate
such issue. Those algorithms drive improved pre- large data sources enables to improve the competi-
dictions and more precise analysis. They are often tive advantage of organizations. For instance, the
based on Machine Learning techniques and use analysis of data streams or archives (e.g., using
not only traditional security logs and events, but predictive or identification models) can help to
also performance and costumers’ data to recognize optimize production processes, to enhance services
and prevent malicious behaviours (Khorshed, Ali, with added value and to adapt them to customers’
& Wasimi, 2014). needs. However, Big Data sharing and analysis rise
Real-time monitoring can be part in mitiga- many security issues and increase privacy threats.
tion strategy to help security decision making. It This chapter presents some of the important Big
assists security analysts to decide if a preventive Data security challenges and describes related
or remedial action should be taken (e.g., change solutions and recommendations. Because it is
user roles or privileges, suspend suspicious ac- nearly impossible to secure very large data sets, it
cess, correct network configurations). The list is more practical to protect the data value and its
of actions depends on the nature of incident and key attributes instead of the data itself, to analyse
its impact. They can be implemented through security risks of combining different evolving Big
an automatically or semi-automatically process. Data technologies and to choose security tools
Ensuring continuous updates of such strategy is a according to the goals of the Big Data project.
good practice. This helps integrating new security
solutions, new laws as well as emerging security
practices and models. REFERENCES
Considering the complexity and evolving
nature of cybercrimes, it is important to promote Alvaro, A. C., Pratyusa, K. M., & Sreeranga, P.
continuous commitment and timely sharing of R. (2013). Big data analytics for security. IEEE
security information between all Smart Grid Security and Privacy, 11(6), 74–76. doi:10.1109/
partners, utilities, and specialized organizations MSP.2013.138
in cybercrimes (Bughin, J., Chui, M., & Manyika, Berman, J. J. (2013). Principles of big data:
J. 2010). Preparing, sharing, and analyzing complex infor-
It is well known that security techniques are not mation. San Francisco, CA: Morgan Kaufmann
sufficient in any industry. They have to be guided Publishers Inc.
311
Big Data Security
Bodei, C., Degano, P., Ferrari, G. L., Galletta, L., Khorshed, M. T., Ali, A. B., & Wasimi, S. A.
& Mezzetti, G. (2012). Formalising security in (2014). Combating Cyber Attacks in Cloud Sys-
ubiquitous and cloud scenarios. In A. Cortesi, N. tems Using Machine Learning. In S. Nepal &
Chaki, K. Saeed, & S. Wierzchon (Eds.), Computer M. Pathan (Eds.), Security, Privacy and Trust in
information systems and industrial management: Cloud Systems (pp. 407–431). Berlin, Germany:
Proceedings of the 11th IFIP TC 8 International Springer. doi:10.1007/978-3-642-38586-5_14
Conference (LNCS) (Vol. 7564, pp. 1-29). Berlin,
Kim, S. H., Kim, N. U., & Chung, T. M. (2013).
Germany: Springer. doi:10.1007/978-3-642-
Attribute relationship evaluation methodology for
33260-9_1
big data security. In Proceedings of the Interna-
Bughin, J., Chui, M., & Manyika, J. (2010). Clouds, tional Conference on IT Convergence and Security
big data, and smart assets: Ten tech-enabled (pp. 1-4). Macao, China: IEEE. doi:10.1109/
business trends to watch. Retrieved from http:// ICITCS.2013.6717808
www.mckinsey.com/insights/high_tech_tele-
Liu, C., Zhang, X., Liu, C., Yang, Y., Ranjan,
coms_internet/clouds_big_data_and_smart_as-
R., Georgakopoulos, D., & Chen, J. (2013). An
sets_ten_tech-enabled_business_trends_to_watch
iterative hierarchical key exchange scheme for
Chen, X., & Huang, Q. (2013). The data protection secure scheduling of big data applications in
of mapreduce using homomorphic encryption. In cloud computing. In Proceedings of the 12th IEEE
Proceedings of the 4th IEEE International Confer- International Conference on Trust, Security and
ence on Software Engineering and Service Science Privacy in Computing and Communications (pp.
(pp. 419-421). Beijing, China: IEEE. 9-16). Washington, DC: IEEE Computer Society.
doi:10.1109/TrustCom.2013.65
Constantine, C. (2014). Big data: An informa-
tion security context. Network Security, 2014(1), Lu, T., Guo, X., Xu, B., Zhao, L., Peng, Y., &
18–19. doi:10.1016/S1353-4858(14)70010-8 Yang, H. (2013). Next big thing in big data: The
security of the ict supply chain. In Proceedings of
Jensen, M. (2013). Challenges of privacy protec-
the International Conference on Social Computing
tion in big data analytics. In Proceedings of IEEE
(pp. 1066-1073). Washington, DC: IEEE Com-
International Congress on Big Data (pp. 235-
puter Society. doi:10.1109/SocialCom.2013.172
238). Washington, DC: IEEE Computer Society.
doi:10.1109/BigData.Congress.2013.39 Mahmood, T., & Afzal, U. (2013). Security analyt-
ics: Big data analytics for cybersecurity: A review
Kasim, H., Hung, T., & Li, X. (2012). Data value
of trends, techniques and tools. In Proceedings
chain as a service framework: For enabling data
of the 2nd National Conference on Information
handling, data security and data analysis in the
Assurance (pp. 129-134). Rawalpindi, Pakistan:
cloud. In Proceedings of the 18th IEEE Interna-
IEEE. doi:10.1109/NCIA.2013.6725337
tional Conference on Parallel and Distributed Sys-
tems (pp. 804-809). Washington, DC: IEEE Com- Mansfield-Devine, S. (2012). Using big data to
puter Society. doi:10.1109/ICPADS.2012.131 reduce security risks. Computer Fraud & Secu-
rity, (8): 3–4.
Katal, A., Wazid, M., & Goudar, R. (2013). Big
data: Issues, challenges, tools and good prac- Pathan, A. S. K. (2014). The state of the art in
tices. In Proceedings of the Sixth International intrusion prevention and detection. Boca Raton,
Conference on Contemporary Computing (pp. FL: Auerbach Publications. doi:10.1201/b16390
404-409). Noida, India: IEEE. doi:10.1109/
IC3.2013.6612229
312
Big Data Security
Ring, T. (2013). It’s megatrends: The security im- Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., &
pact. Network Security, 2013(7), 5–8. doi:10.1016/ Ranjan, R. et al. (2014). A security framework in
S1353-4858(13)70080-1 g-hadoop for big data computing across distrib-
uted cloud data centres. Journal of Computer and
Shin, D., Sahama, T., & Gajanayake, R. (2013).
System Sciences, 80(5), 994–1007.
Secured e-health data retrieval in daas and big data.
In Proceedings of the 15th IEEE international
conference on e-health networking, applications
services (pp. 255-259). Lisbon, Portugal: IEEE. KEY TERMS AND DEFINITIONS
doi:10.1109/HealthCom.2013.6720677
Anonymization: Anonymization is the process
Stimmel, C. L. (2014). Big data analytics strate- of protecting data privacy across information
gies for the smart grid. Boca Raton, FL: Auerbach systems. Several models and methods are used to
Publications. doi:10.1201/b17228 implement it such as: t-closeness, m-invariance,
Stoianov, N., Uruena, M., Niemiec, M., Machnik, k-anonymity and l-diversity.
P., & Maestro, G. (2012). Security Infrastructures: Authentication: Authentication aims to test
Towards the INDECT System Security. Paper and ensure, with a certain probability, that par-
presented at the 5th International Conference on ticular data are authentic, i.e., they have not been
Multimedia Communication Services & Security changed.
(MCSS), Krakow, Poland. doi:10.1007/978-3- Big Data: Big Data is mainly defined by its
642-30721-8_30 3Vs fundamental characteristics. The 3Vs include
Velocity (data are growing and changing in a
Sykora, M., Jackson, T., O’Brien, A., & Elayan, rapid way), Variety (data come in different and
S. (2013). National security and social media multiple formats) and Volume (huge amount of
monitoring: A presentation of the emotive and data is generated every second).
related systems. In Proceedings of the European Confidentiality: Confidentiality is a property
Intelligence and Security Informatics Confer- that ensures that data is not made disclosed to
ence (pp. 172-175). Uppsala, Sweden: IEEE. unauthorized persons. It enforces predefined rules
doi:10.1109/EISIC.2013.38 while accessing the protected data.
Tankard, C. (2012). Big data security. Network Encryption: Encryption relies on the use
Security, (7): 5–8. of encryption algorithms to transform data into
encrypted forms. The purpose is to make them
Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). unreadable to those who do not possess the en-
Data mining with big data. IEEE Transactions on cryption key(s).
Knowledge and Data Engineering, 26(1), 97–107. Privacy: Privacy is the ability of individuals
doi:10.1109/TKDE.2013.109 to seclude information about themselves. In other
Zhang, X., Liu, C., Nepal, S., Yang, C., Dou, W., words, they selectively control its dissemination.
& Chen, J. (2014). A hybrid approach for scal- Security Management: Security management
able sub-tree anonymization over big data using is a part of the overall management system of an
mapreduce on cloud. Journal of Computer and organization. It aims to handle, implement, moni-
System Sciences, 80(5), 1008–1020. doi:10.1016/j. tor, maintain, and enhance data security.
jcss.2014.02.007
313