Professional Documents
Culture Documents
The subject of this study is the challenges of Big Data and to realize those challenges two
important questions need to be answered what is Big Data? And what are the characteristics
of Big Data? The term “Big Data” is a little bit an inaccurate designation because it means
that the preexisting data is small while it is really not and also it indicates that the data size is
the one challenge we have [1]. Simply Big Data refers to the data and information which
can’t be handled or processed through the current traditional software systems. Big Data is
large sets of structured and unstructured data which needs be processed by advanced
analytics and visualization techniques to uncover hidden patterns and find unknown
correlations to improve the decision making process.
Background
Challenges of big
data
Data integration and aggregation: The stream of Big Data is heterogeneous, so it is not
enough to capture it and save in our repository. For example if we take the data of several
scientific experiments, it would be useless to save them as bunch of data sets.It is not likely
that someone will find this data or include it in any analysis. However if the data has
adequate metadata, it might be used but the challenge still arise from the differences on the
experimental
details and the hosting data record structure. Data analysis is a sophisticated process and
more than simply finding, identifying, understanding and citing data. Perform data analysis in
large scale requires automating all these steps. This needs to express different data structures
and semantics in form that computer can understand and then resolve automatically. A lot of
work has been conducted in the field of data integration, however still more additional efforts
required achieving automatic error-free different solution.
Big Data: Issues, Challenges and Techniques in Business Intelligence
Conference Paper · December 2015
Background topic
Data is growing exponentially as it is being generated and recorded from everyone and
everywhere for example online social networks, sensor devices, health records, human
genome sequencing, phone logs, government records, professionals such as scientists,
journalists, writers etc [1]. Formation of such huge amount of data from multiple sources with
high volume and velocity by variety of digital devices gives birth to the term Big Data. As
the big data grows with high velocity (speed), it becomes very complex to handle, manage
and analyze by using existing traditional systems. Data stored within the data warehouses is
different from the big data. The former one is cleaned, managed, known and trusted and the
later one includes all the warehouse data as well as the data which these warehouses are not
capable to store [2]. The big data problem means that a single machine can no longer process
or even hold all of the data that we want to analyze. The only solution we have is to distribute
the data over large clusters. An example of a large cluster is one of Google's data centers that
contain tens of thousands of machines.