You are on page 1of 14


Gautam. Dr. Chavi Rana

Research Scholar. Assistant Professor
UIET, Rohtak. UIET, Rohtak

It is very difficult to storing, managing and processing huge amount of data. The term Big
Data describes various techniques and technologies to store, distribute, manage and analyze
huge amount of data with different structures. Big data consists of structured, unstructured or
semi-structured data so there is problems occur regarding incapability of conventional data
management methods. To process these huge amounts of data in an inexpensive and efficient
way, parallelism is used. Big Data is a data which is in large amount and having complexity in
it and this complexity require new architecture, techniques, algorithms, and analytics to
manage it and extract knowledge from it. Hadoop is a framework for processing large amount
of data and provides better storage capacity for large datasets and performs parallel
processing of big data that gives better computational power to all the tasks. It works in batch
processing mode and Hadoop is the core platform for structuring Big Data, it also solves the
problem of making it useful for analytics purposes. In this paper, we provide a brief overview
of Big data management involving hadoop and highlight research efforts and the challenges
to big data.

Index Terms: Big Data, Hadoop, Map Reduce, HDFS, Hadoop Component.

1. Introduction:
1.1. Big Data: Definition

Big data is a term used to describe the exponential growth and availability of data, having
structured, unstructured and semi-structured data, whose size (volume), complexity (variability),
and rate of growth (velocity) make them difficult or even impossible to be managed and
analyzed using conventional software tools and technologies. When the amount of data to be
increases than the time to produce results is also increased. Retrieved data from big data is still a
complex and time consuming approach. Big data provides tremendous opportunities for
enterprise information management and decision making. In the recent study big data is not only
limited to business needs but also helps in research and scientific issues.

The Big Data problem is characterized by the 3V features:

Volume- a huge amount of data, Volume of big data can be measured in terms or several
megabytes, gigabytes, terabytes or petabytes.
Velocity- a high data ingestion rate or the speed with which the data can be analyzed.
Variety- a mix of structured data, semi-structured data, and unstructured data.
These 3V features gives a challenge to data processing systems since these systems cannot either
scale to the huge data volume in a cost-effective way or fail to handle data with variety of types.
The solutions to the Big Data problem are largely based on the MapReduce framework[9]
and its open source implementation Hadoop. Although Hadoop handles the data volume
challenge successfully. Hadoop is the open source software founded by Apache and it is Linux
based software. It is used by famous websites like Google, Yahoo, Facebook, Amazon and many
more. Hadoop is a framework for processing large amount of data and provides better storage
capacity for large datasets and performs parallel processing of big data that gives better
computational power to all the tasks. It works in batch processing mode and having two major
components HDFS (Hadoop Distributed File System)[12] for huge data storage and MapReduce
for processing huge amount of datasets. When the data size is increased it create problems to
existing algorithms to manage that so here main problem is to store and process that huge
amount of data and this problem is solve by hadoop because it store and process huge amount of
data in less time.

1.2. Hadoop:

Hadoop is an open-source software framework used for distributed storage and processing of big
data using the MapReduce programming model. Modules present in Hadoop are designed with a
fundamental assumption that hardware failures are common occurrences and should be
automatically handled by the framework. The core of hadoop consists of two parts the storage
part and processing part.
a) Storage part: Storage part of hadoop is HDFS (Hadoop distributed file system) which stores
huge amount of data with high degree of throughput and this huge data is stored in form of
b) Processing part: Processing part of hadoop is Mapreduce which is a software framework
which process large amount of data in the form of clusters.
Hadoop distribute clusters to the node so that they process parallely and this approach also takes
advantage of data locality This allows the dataset to be processed faster and more efficiently
which make it a more conventional supercomputer architecture which work on a parallel file
system where computation and data are distributed via high-speed networking

Fig.1.1. Hadoop architecture

A small Hadoop cluster having single master and multiple worker nodes called as slave node as
shown in Fig. 1.1. The master node consists of a Task Tracker, Job Tracker, NameNode, and
DataNode [14] where as slave or worker node acts as both a DataNode and TaskTracker.

1.3. HDFS:

Hadoop Distributed File System (HDFS) is the storing component in hadoop which store huge
amount of structured, unstructured and seminars-structured data. HDFS is java based file system.
HDFS is reliable and manageable file system. It has great features such as high availability, load
balancing, security, flexible access, fault tolerance, easy management and high data throughputs.
It provides parallel processing of data. HDFS has master/ slave architecture.[23]

Fig. 1.2. HDFS Architecture

1.4. Hadoop MapReduce:

MapReduce is a java based programming paradigm for processing huge amount of data stored in
HDFS. MapReduce is the heart of the Hadoop framework that provides scalability across
thousands of hadoop cluster. Every MapReduce job performs two tasks - one Map task and the is
Reduce task. Map task takes a set of data, processes it at node level and generates the output. The
reduce job takes the output of the map task as the input and combines them to smaller set of
tuples (reduces the large dataset into a smaller one) based on the transformations and various
logic.The advantage of MapReduce is that it is easy to scale data processing over multiple
computing nodes.
Fig. 1.3. MapReduce Architecture

Map stage: The map stage job is to process the input data as shown in Fig. 1.3. Generally the
input data is in the form of file or directory and it is stored in the Hadoop file system (HDFS).
The input file is passed to the map function that processes the data and creates several small
chunks of data.
Reduce stage: The Reducers job is to process the data that comes from the map stage. After
processing, it produces a new set of output, which will be stored in the Hadoop Distributed File
System (HDFS).

2. Literature Survey:
This paper provides a detailed review of different approaches used in Big Data in recent years.
Table provides the extensive survey of researches; with the name of author, year of publication
in descending order of research along with purposed work and approaches used by them as
shown below:

Authors Publication Proposed Work Technique

Year used
Daniele Apiletti, 2017 Reviews Hadoop and Spark based Spark
Elena Baralis, scalable algorithms for mining problem algorithms for
Tania Cerquitelli, in the Big Data domain having both mining in the
Paolo Garza, Fabio theoretical and experimental comparative Big Data is
Pulvirenti, Luca analyses. used.
Venturini. [33]
Dinesh J. Prajapati, 2017 The proposed method initially extracts Use DMFPM
Sanjay Garg, N.C. multilevel association rules including for extracts
Chauhan. [34] level-crossing for each zone using multilevel
DMFPM. From both multilevel association rules
consistent and inconsistent rules are including level-
evaluated and compared based on crossing for
different experimental results that lead to each zone.
the final conclusions.

Robin Genuer, 2017 Proposed a selective review that deal Addressing a

Jean-Michel Poggi, with scaling random forests to Big Data bag error
Christine Tuleau- problems and also describe how out of problem.
Malot, Nathalie bag error addressed.
M. Bakratsas, P. 2017 Investigate the relative performance and Evaluate SSDs
Basaras, D. benefits of SSDs versus hard disk drives and HDDs by
Katsaros, L. (HDDs) when they are used as storage executing
Tassiulas. [36] for Hadoop's MapReduce. algorithm on
real social
network data.
Ziliang Zong, 2017 Presented the design of marched system Designed a
Rong Ge, Qijun and demonstrate it measurement tools for marched system
Gu. [37] obtaining power consumption data in and its tools.
different research.

Guangchen Ruan 2017 Proposed framework that integrates Parallel mining

and Hui Zhang. information visualization, scalable algorithm
[38] computing, and user interfaces to explore running on HPC
large-scale multi-modal data streams is used.
which combine to reveal an effective and
efficient way to perform closed-loop big
data analysis with visualization and
scalable computing.

Navroop Kaur, 2017 Presented resource management system Using Cod and
Sandeep K. Sood. which solves the problems regarding SOM estimate
[39] selecting and allocating appropriate big data
resource to big data and used 4 V's characteristics.
property of big data.

Feras A. Batarseh, 2016 Study on healthcare data that is collected Assesses QoS
Eyad Abdel Latif. from various different sources so that for examines
[40] quality and best practices of field is done historical health
using big data tools. data by
Dawei Jiang, Sai 2016 Presents epiC, an extensible system to Introduce a new
Wu, Gang Chen, define the Big Datas data variety programming
Beng Chin Ooi1, challenge. They also present the design model system
Kian-Lee Tan, Jun and implementation of epiCs concurrent called epiC.
Xu. [2] programming model and two customized
data processing models.

Marcos D. 2015 Discusses environments for carrying out Define various

Assunoa, analytics on Clouds for Big Data method used in
Rodrigo N. applications. Through survey they find data
Calheiros, Silvia out possible gaps in technology and management,
Bianchi, Marco provide future directions on Cloud- model
A.S. Nettoc, supported Big Data computing. development,
Rajkumar Buyya. visualization
[3] and business
Sreedhar C.N, 2015 The primary purpose of their work is to Algorithm of
Kasiviswanath, P. provide a comprehensive survey on Big Delay and
Chenna Reddy. [1] data management and to provide an Genetic
overview on various algorithms related scheduling is
to job scheduling in Hadoop. used.

Chao Wang, Xi Li, 2015 Proposed a FPGA-based acceleration Used FPGA-

Peng Chen, Aili solution with MapReduce framework. based
Wang, Xuehai The combination of these two namely acceleration
Zhou, and Hong hardware acceleration and MapReduce solution with
Yu. [19] execution flow can enhance the task of MapReduce
aligning short length reads to a known framework.
reference genome.

Tao Xu, 2015 Presented an efficient system for Used PB level

Dongsheng Wang managing PB level structured data called structured data
and Guodong Liu. Banian, banian overcomes the storage called Banian.
[20] problem.

Qinghua Lu, 2015 Presented conceptual framework Framework

Zheng Li, Maria CF4BDA to analyze the existing work CF4BDA to
Kihl, Liming Zhu done on BDA applications involving the analyze the
and Weishan lifecycle of BDA applications and work on BDA
Zhang. [26] objects involving in BDA applications in applications.
the cloud.
Claudio A. 2015 Presented score-based benchmark for Used score-
Ardagna, Ernesto NoSQL databases, which supports based
Damiani, Fulvio adopters. The proposed benchmark is benchmark for
Frati, Davide independent from the specific NoSQL.
Rebeccani. [25] configurations of the database and
deployment environment.

Hongbing Wang, 2015 Proposed heterogeneous and trust-based Heterogeneous

Chao Yu, Lei Wan service selection by developing a novel and trust-based
and Qi Yu. [24] multi-objective optimization approach to selection by
make trade-off decision between optimization
Services trust value and users QoS approach.
preference to rank candidate.

Simon Fong, 2015 Presented algorithms to collect big data Accelerated

Raymond Wong, which is present in large degree and test Particle Swarm
and Athanasios V. it for performance evaluation by using Optimization
Vasilakos. [23] accelerated particle swarm optimization (APSO)
(APSO) type of swarm search that algorithms to
enhanced analytical accuracy within collect big data.
reasonable processing time.

Yanhao Huang and 2015 Proposed the structure, elements, basic The knowledge
Xiaoxin Zhou. [22] calculations and multi-dimensional model is
reasoning method of the new knowledge established and
model. Research shows more powerful various
and adapts various knowledge calculations is
requirements of electric power big data. done.

Marco Viceconti, 2015 Proposed that bid data analytics can Use VPH
Peter Hunter, and successfully combined with VPH technology and
Rod Hose. [21] technology to give desirable medical combined it
solutions. with big data
Alun Evans Javi 2015 Presented a web-based application Use WEBGL
Agenjo Josep Blat. having analytic visualization of on-set 3D on the web
[28] media data and metadata, which and meta data
combines research from several fields of visualization
image processing and 3D graphics. techniques.

Syed Akhter 2015 Described the nascent field of big data Nascent field of
Hossain. [29] analytics in education with discussion on big data
prospects and challenges way forward. analytics in
Also focus on research and development education and
issues for educationist and practitioners development
of big data analytics. issues for
educationist of
big data
analytics is

Xue-Wen Chen 2014 Presented overview of deep learning, and Unprecedented

AND Xiaotong also highlight current research efforts challenges to
Lin. [30] and the challenges to big data, as well as harnessing data
the future trends. and information
is presented.
Matturdi Bardi, 2014 Reviewed the various benefits and Big data
Zhou Xianwei, LI challenges of security and privacy in Big security and
Shuai, LIN Data and also presented some possible privacy
Fuhong. [32] methods and techniques to ensure Big technique is
Data security and privacy. defined.

Suman Arora, 2014 Study and analyzed various techniques Speculative

Dr.Madhu Goel. of scheduling which enhance the execution and
[7] performance by using Hadoop. Copy compute
technique of
Chang Liu, Jinjun 2014 Presented types of fine-grained data Describe a
Chen, Chi Yang, updates and scheme that can fully scheme for
Rajiv Ranjan, and support authorized auditing and fine- supporting
Ramamohanarao grained update requests. Also propose an variable sized
Kotagiri. [16] enhancement that can reduce data blocks.
communication overheads for verifying
small updates.

Shifeng Fang, Li 2014 Introduces a novel IIS that combines Combine IoT,
Da Xu, Yunqiang Internet of Things (IoT), Cloud GIS and e-
Zhu, Jiaerheng Computing, Geoinformatics, science for
Ahati, Huan Pei, geographical information system (GIS) environmental
Jianwu Yan, and and e-Science for environmental monitoring and
Zhihui Liu. [17] monitoring and management, with a case management.
study on climate change and its
ecological effects of a particular region.

Daisuke Takaishi, 2014 Proposed a new mobile sink routing and Use EM
Hiroki Nishiyamai, data gathering method with the help of algorithm for
Nei Katoi and Ryu network clustering based on modified clustering.
Miura. [18] expectation maximization technique.

Andrea Marinoni, 2013 Provided study of the connection micro and

Arianna Dagliati, between air pollution and clinical macro-vascular
Riccardo Bellazzi, records, than correlations among black disease can be
Paolo Gamba1. particulate concentration, micro and drawn by
[27] macro-vascular disease can be drawn creating
properly. connection
between various
Xiongpai Qin, and 2013 Reviewed last several years big data Use MRBench
Xiaoyun Zhou. [4] benchmark work and their characteristics for evaluating
are analyzed. the MapReduce
Rakesh Varma. [6] 2013 Objective of the research is to study For managing
about MapReduce and various Big Data
algorithms of scheduling which enhance various
the scheduling performance. scheduling
algorithms and
execution is
Daniel Warneke. 2011 Discuss the opportunities and challenges Use Nephele, a
[15] for parallel data processing in clouds and new data
present Nephele. And evaluate the processing
MapReduce process and compare the framework.
result of framework Hadoop data

Jasmin Azemovic, 2010 Presented research on using different Define various

Denis Music. [13] data types for storing unstructured data way for storing
within database and this research is unstructured
inspired with current situation of data.
information society.

Mengjie Zhou, 2010 Proposed a SLCA (Smallest Lowest SLCA based

Haoji Hu and Common Ancestor) based keyword keyword search
Minqi Zhou. [14] search implementation for large-scale implementation
XML data sets on a MapReduce cluster. for large-scale
Leonardo 2010 Outline the S4 architecture and describe For dealing with
Neumeyer,Bruce applications of real-life deployments. unbounded
Robbins, Anish They includes large scale applications stream of data
Nair, Anand for data mining and machine learning . S4 architecture
Kesari. [5] is used.

BI Shuoben, Xu 2009 Introduces the single-dimensional Single-

Yin, Jiao Feng, L Boolean association rule on Apriori dimensional
Guonian, PEI algorithm, and the data mining algorithm Boolean
Anping. [12] of the multi-dimensional association rule association rule
based on BUC algorithm . on Apriori
algorithm and
BUC algorithm.
Hui Fang, Ming 2007 Proposed a approach to localize the Use
Yang, Ruqing vehicle position with respect to a global dimensioning
Yang. [11] map, It is based on the texture of ground technique for
from where the vehicle moves. localizing
global map.
Seema Metikurke, 2006 Describes a grid-enabled approach for Grid-enabled
Vijay K. automatic web page classification that approach for
Vaishnavi. [10] applies the vector space model automatic web
information retrieval strategy. page
John. H. Phan, 2004 Reporting the results of the first phase Use
Chang. F. Quo, development of novel system, to use unsupervised
and May D. Wang. unsupervised methods of clustering to methods of
[9] discover relationship of genes and clustering to
knowledge-based supervised discover
classification is used to get accurate relationship of
prediction in cancer diagnosis. genes and
Sushant Goel, 2003 Distribute the scheduling responsibilities A new
Hema Sharda, to the nodes where data is actually serializability
David Tanid. [8] located and also propose a new criterion,
serializability criterion, Parallel Database Parallel
Quasi-Serializability. Database Quasi-
(PDQS) is used.

3. Challenges:
Big data is very huge amount of data so set of challenges occur because difficulties regarding
management, storing, scheduling, security and processing occur. First, Data preparation,
efficiently distributed storage and search is required for effective online analysis which requires
effective techniques for data mining. Efficient handling of big data stream is big challenge
which uses various programming models.
Second, Scheduling, scheduling approach should be smart enough to make real-time responses to
a changing environment. Third, Data Integration, new protocols and interfaces are require which
are able to manage structured, semi- structured and unstructured data. Fourth, Visualisation and
user interaction. There are many research challenges present in big data visualisation so more
efficient techniques are required in real time visualization.
In addition, Security and Privacy is also a big issue in big data. Security is crucial phase in any
organization so strong mechanisms for the privacy of data should be needed.

4. Conclusion:
A survey of different big data approaches is presented of recent years. It is found that solutions to
Big Data problem are largely based on the MapReduce framework and its open source
implementation Hadoop, Hadoop handles the data volume challenge successfully. Big data
management includes different tools, techniques and various algorithms for job scheduling in
hadoop. This paper helps to a novice who wants to pursue his/her career in the field of big data.

5. Future Direction:

This work can be extended by developing a new job scheduling algorithm which consider all the
parameters which can produce better performance. Second, the user profile (similar users) and us
age profile (invoked services) should taken and some related collaborative filtering techniques
can be considered to integrate with ourservice selection approach.

6. References:

[1] Sreedhar C.N, Kasiviswanath, P. Chenna Reddy, A Survey on Big Data Management and
Job Scheduling" International Journal of Computer Applications (0975 8887)
Volume 130 No.13, November 2015.
[2] Dawei Jiang, Sai Wu, Gang Chen, Beng Chin Ooi1and Kian-Lee Tan, Jun Xu, "epiC: an
extensible and scalable system for processing Big Data"The VLDB Journal (2016) 25:326
DOI 10.1007/s00778-015-0393-2.
[3] Marcos D. Assunoa, Rodrigo N. Calheiros, Silvia Bianchi, Marco A.S. Nettoc, Rajkumar
Buyya, "Big Data computing and clouds: Trends and future directions"J. Parallel Distrib.
Comput. 7980 (2015) 315.
[4] Xiongpai Qin, and Xiaoyun Zhou, "A Survey on Benchmarks for Big Data and Some More
Considerations" H. Yin et al. (Eds.): IDEAL 2013, LNCS 8206, pp. 619627, 2013. Springer-
Verlag Berlin Heidelberg 2013.
[5] Leonardo Neumeyer,Bruce Robbins,Anish Nair,Anand Kesari, "S4: Distributed Stream
Computing Platform" 2010 IEEE International Conference on Data Mining Workshops.
[6] Rakesh Varma,"Survey on MapReduce and Scheduling Algorithms in Hadoop" International
Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value
(2013): 6.14 | Impact Factor (2013): 4.438.
[7] Suman Storage and Dr.Madhu Goel, "Survey Paper on Scheduling in Hadoop" Volume 4,
Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer
Science and Software Engineering.
[8] Sushant Goel, Hema Sharda and David Tanid, "Distributed Scheduler for High Performance
Data-Centric Systems" b7803-76CI-XIO1lB17.00 0 2003 IEEE.
[9] John. H. Phan, Chang. F. Quo, and May D. Wang, "Comparative Study of Microarray Data
for Cancer Research" proceedings of the 26th Annual International Conference of IEEE EMBS
San Francisco, CA, USA * September 1-5, 2004.
[10] Seema Metikurke and Vijay K. Vaishnavi, "Grid-Enabled Automatic Web Page
Classification" 2006 IEEE International Conference on Fuzzy Systems Sheraton Vancouver Wall
Centre Hotel, Vancouver, BC, Canada July 16-21, 2006.
[11] Hui Fang, Ming Yang and Ruqing Yang, "Ground Texture Matching based Global
Localization for Intelligent Vehicles in Urban Environment" Proceedings of the 2007 IEEE
Intelligent Vehicles Symposium Istanbul, Turkey, June 13-15, 2007.
[12] BI Shuoben, XU Yin, JIAO Feng, L Guonian, PEI Anping, "Study on Data Mining in First
Period of Jiangzhai Site Based on the Association Algorithms" 2009 International Conference on
Artificial Intelligence and Computational Intelligence, 2009 IEEE DOI 10.1109/AICI.2009.
[13] Jasmin Azemovic,Denis Music, "Comparative analysis of efficient methods for storing
unstructured data into database with accent on performance" 201O,IEEE 2nd International
Conference on Education Technology and Computer (ICETC).
[14] Mengjie Zhou,Haoji Hu and Minqi Zhou, "Searching XML Data by SLCA on a MapReduce
Cluster 2010 IEEE.
[15] Daniel Warneke, "Exploiting Dynamic Resource Allocation for Efficient Parallel Data
SYSTEMS, VOL. 22, NO. 6, JUNE 2011.
[16] Chang Liu, Jinjun Chen, Chi Yang, Rajiv Ranjan and Ramamohanarao Kotagiri,
"Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable
[17] Shifeng Fang, Li Da Xu, Yunqiang Zhu, Jiaerheng Ahati, Huan Pei, Jianwu Yan, and Zhihui
Liu, "An Integrated System for Regional Environmental Monitoring and Management Based on
NO. 2, MAY 2014.
[18] Daisuke Takaishi, Hiroki Nishiyamai, Nei Katoi and Ryu Miura, "Toward Energy Efficient
Big Data Gathering in Densely Distributed Sensor Networks" 2014 IEEE.
[19] Chao Wang, Xi Li, Peng Chen, Aili Wang, Xuehai Zhou and Hong Yu, "Heterogeneous
Cloud Framework for Big Data Genome Sequencing" IEEE/ACM TRANSACTIONS ON
[20] Tao Xu, Dongsheng Wang and Guodong Liu, "Banian: A Cross-Platform Interactive Query
System for Structured Big Data" TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-021
07/11 p p 6 2- 7 1 Volume 20, Number 1, February 2015.
[21] Marco Viceconti, Peter Hunter and Rod Hose, "Big Data, Big Knowledge: Big Data
[22] Yanhao Huang and Xiaoxin Zhou, "Knowledge Model for Electric Power Big Data
Based on Ontology and Semantic Web" CSEE JOURNAL OF POWER AND ENERGY
[23] Simon Fong, Raymond Wong, and Athanasios V. Vasilakos, "Accelerated PSO Swarm
Search Feature Selection for Data Stream Mining Big Data" IEEE TRANSACTIONS ON
[24] Hongbing Wang, Chao Yu, Lei Wan and Qi Yu, "Effective BigData-Space Service
Selection over Trust and Heterogeneous QoS Preferences" IEEE, 2015.
[25] Claudio A. Ardagna, Ernesto Damiani, Fulvio Frati, Davide Rebeccani,"A Configuration-
Independent Score-Based Benchmark for Distributed Databases" DOI
10.1109/TSC.2015.2485985, IEEE Transactions on Services Computing.
CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud" IEEE
October 27, 2015.
[27] Andrea Marinoni, Arianna Dagliati, Riccardo Bellazzi, Paolo Gamba1, "INFERRING AIR
[28] Alun Evans Javi Agenjo Josep Blat, "COMBINED 2D AND 3D WEB-BASED
[29]Syed Akhter Hossain, "Big Data Analytics in Education: Prospects and Challenges" 978-1-
4673-7231-2/15/ 2015 IEEE.
[30] XUE-WEN CHEN1, AND XIAOTONG LIN, "Big Data Deep Learning: Challenges and
Perspectives" May 16, 2014, IEEE.
[31] Zhi-Hua Zhou,Nitesh V. Chawla,Yaochu Jin,Graham J. Williams, "Big Data Opportunities
and Challenges: Discussions from Data Analytics Perspectives" IEEE Computational
intelligence magazine | November 2014.
[32] MATTURDI Bardi, ZHOU Xianwei, LI Shuai, LIN Fuhong, "Big Data security and
privacy: A review China Communications Supplement No.2 2014.
[33] Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, Luca
Venturini , "Frequent itemsets mining for big data: A Comparative Analysis" IEEE ,Aug 2017.
[34] Dinesh J. Prajapati,Sanjay Garg, N.C. Chauhan, "MapReduce Based Multilevel Consistent
and Inconsistent Association Rule Detection from Big Data Using Interestingness Measures"
vol-9 September 2017,IEEE.
[35] Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot, Nathalie Villa-Vialaneix,
"Random Forests for Big Data" Vol-23,IEEE 2017.
[36].M. Bakratsas, P. Basaras, D. Katsaros , L. Tassiulas, "Hadoop MapReduce Performance on
SSDs for Analyzing Social Networks " IEEE 2017.
[37] Ziliang Zong, Rong Ge, Qijun Gu, "Marcher: A Heterogeneous System Supporting Energy-
Aware High Performance Computing and Big Data Analytics" Volume 8, July 2017.
[38] Guangchen Ruan and Hui Zhang, "Closed-loop Big Data Analysis with Visualization and
Scalable Computing ". Volume 8, July 2017.
[39] Navroop Kaur , Sandeep K. Sood, "Efficient Resource Management System Based on 4Vs
of Big Data Streams " Volume 13,April 2017.
[40] Feras A. Batarseh, Eyad Abdel Latif , "Assessing the Quality of Service Using Big Data
Analytics: With Application to Healthcare" Volume4, June 2016.
[41] Gantz J, Reinsel D. The digital universe in 2020: Big data, bigger digital shadows, and
biggest growth in the Far East [J]. IDC iView: IDC Analyze the Future, 2012.
[42] Weiss R, Zgorski L. Obama Administration Unveils Big Data Initiative: Announces $200
Million in New R&D Investments [J]. Office of Science and Technology Policy, Washington,
DC, 2012. Data P. The Emergence of a New Asset Class[C]//World Economic Forum Report.
[43] Anderson C. The end of theory: the data deluge makes the scientific method obsolete. Wired
Magazine 16.07[J]. 2008.
[44] Mayer-Schnberger V, Cukier K. Big data: A revolution that will transform how we live,
work, and think [M]. Houghton Mifflin Harcourt, 2013.
[45] Ardagna C A, Damiani E. Business Intelligence meets Big Data: An Overview on Security
and Privacy [J].
[46] Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition,
and productivity [J]. 2011.
[47] Laney D. 3-D Data Management: Controlling Data Volume [J]. Velocity and Variety,
META Group Original Research Note, 2001.
[48] Beyer M. Gartner says solving big data challenge involves more than just managing
volumes of data. Gartner [J]. 2011.
[49] Beyer M A, Laney D. The importance of 'big data': a definition [J]. Stamford, CT: Gartner,
[50] Lefevre C. LHC: the guide (English version) [R]. 2009.[14] Brumfiel G. Down the petabyte
highway[J]. Nature, 2011, 469(20): 282-283.
[51] Mangelsdorf J. Supercomputing the climate: Nasas big data mission[J]. Accessed online,
2013: 11-27.
[52] Kalil T. Big data is a big deal [J]. The White House, 2012.
[53] Sheet F. Big Data Across the Federal Government [J]. 2012 03-29)[2013-03-06].
http://www. whitehouse, gov/sites/default/ files/microsites/ostp/big_ data fact sheet final. pdf,
[54] Lampitt A. The real story of how Big Data analytics helped Obama win [J]. Info World,
2013, 14.