Professional Documents
Culture Documents
📎 Contents:
Introduction
The characteristics of big data
Volume
Velocity
Variety
Veracity
Value
Types of Data
Structured Data
Semi structured Data
Unstructured Data
Introduction
“Big Data” - large and complex datasets that cannot be processed using
traditional methods of data management.
Big data undergoes various stages, the last one of which is the analysis of the
data, which gives all the information that one can extract from that dataset.
Traditional data analysis methods - exploratory paths → consider the past and the
current form of data.
Big data analysis is a predictive analysis - focus on the current phase and
future outcome of the data.
Now, analytics is a data-driven process - use structured and clean data for
building a model which try on unstructured data.
Models are built using statistical and probabilistic methods while analyzing big
data → effectively in making real-time predictions and detecting anomalies.
Real-time data found around us in our day-to-day lives can be in any form such as
finance or government records, research or biological data,….
In the case of the healthcare sector, a huge amount of data concerning an individual
is generated on a daily basis.
Velocity
Velocity - the rate of collection of data.
Variety
Variety - the diverseness of the huge amount of data that is being collected →
characteristics of data,
whether the data is sorted - data of the same category are in the same group,
Veracity
Veracity - the affirmation of data → the reliability of the chunk of data,
whether the collected data can be useful for the establishment of any useful
relation or not,
Value
The most important characteristic - the other characteristics depending on Value.
Types of Data
Variety, which is one of the Vs, is the cause of the emergence of the different types
of big data, as variety itself defines the diverseness of the huge amount of collected
data in the context of big data.
Structured Data
It comprises a group of data show a particular pattern among themselves → easy
to be retrieved and analyzed by any individual or any type of computer program.
Structure query language - SQL, plays a crucial role in the arrangement of the
structured data → to the generation of a database.
This type of data is very helpful in affirming the security of the data, and in
modifying the data.
In the case of the healthcare sector, structured data is very valuable in maintaining
the clinical records of an individual for the treatment of disease.
This type of data comprises tags and elements which are being used to gather data
of one type - a certain group that is assembled and arranged in a hierarchical
manner.
Since there is no sure and certain relationship among the datasets, it causes
difficulty for a computer program to work efficiently. Due to the semi structured
nature of the data, the keeping of data has become tough. The size of the
characteristic features may be varied in a particular criterion.
Unstructured Data
It comprises a group of data which cannot be arranged, nor can any type of
relationship be established between them → a computer program is restricted to
access it.
This type of data is very helpful in handling the diverseness of data present in a
group of datasets. But this type of data does not guarantee the security of data.
The process of modification of the dataset is also restricted due to its nature.
This type of data can be handled with the help of extensible markup language -
XML