You are on page 1of 4

1.

Introduction to Big Data


Tags

📎 Contents:
Introduction
The characteristics of big data
Volume
Velocity
Variety
Veracity
Value
Types of Data
Structured Data
Semi structured Data
Unstructured Data

Introduction
“Big Data” - large and complex datasets that cannot be processed using
traditional methods of data management.
Big data undergoes various stages, the last one of which is the analysis of the
data, which gives all the information that one can extract from that dataset.
Traditional data analysis methods - exploratory paths → consider the past and the
current form of data.
Big data analysis is a predictive analysis - focus on the current phase and
future outcome of the data.

Earlier, analytics was a model-driven process.

Now, analytics is a data-driven process - use structured and clean data for
building a model which try on unstructured data.

Models are built using statistical and probabilistic methods while analyzing big
data → effectively in making real-time predictions and detecting anomalies.

Real-time data found around us in our day-to-day lives can be in any form such as
finance or government records, research or biological data,….

1. Introduction to Big Data 1


The characteristics of big data
Volume
Volume - large chunks of data - plays an important role in interpreting the
worthiness of data.

In the case of the healthcare sector, a huge amount of data concerning an individual
is generated on a daily basis.

Velocity
Velocity - the rate of collection of data.

There is continuous circulation as well as building up of data -necessary to process


and analyze these data at the same rate → gain valuable information from these
chunks of data.

Variety
Variety - the diverseness of the huge amount of data that is being collected →
characteristics of data,

whether the data is sorted - data of the same category are in the same group,

or unsorted - data are not arranged → there is no relationship that can be


established between them.

Veracity
Veracity - the affirmation of data → the reliability of the chunk of data,

whether the collected data can be useful for the establishment of any useful
relation or not,

whether any useful interpretation can be made or not.


→ helpful in checking the credibility of data as well as in ensuring that the
interpreted information does not have any error.
In the case of healthcare, multiple examinations are done on an individual so that the
diagnosis of a disease has no room for error and treatment can be provided
accordingly.

Value
The most important characteristic - the other characteristics depending on Value.

1. Introduction to Big Data 2


Without value, there is no interpretation, no analysis of data, and no affirmation of
data.

Types of Data
Variety, which is one of the Vs, is the cause of the emergence of the different types
of big data, as variety itself defines the diverseness of the huge amount of collected
data in the context of big data.

Structured Data
It comprises a group of data show a particular pattern among themselves → easy
to be retrieved and analyzed by any individual or any type of computer program.

The structured data is arranged in a tabular manner - in rows and columns, to


define the characteristic feature of the dataset.

Structure query language - SQL, plays a crucial role in the arrangement of the
structured data → to the generation of a database.

This type of data is very helpful in affirming the security of the data, and in
modifying the data.

In the case of the healthcare sector, structured data is very valuable in maintaining
the clinical records of an individual for the treatment of disease.

Semi structured Data

1. Introduction to Big Data 3


It comprises a group of data which shows a particular arrangement but fails to
define a relationship between them.

This type of data comprises tags and elements which are being used to gather data
of one type - a certain group that is assembled and arranged in a hierarchical
manner.
Since there is no sure and certain relationship among the datasets, it causes
difficulty for a computer program to work efficiently. Due to the semi structured
nature of the data, the keeping of data has become tough. The size of the
characteristic features may be varied in a particular criterion.

Unstructured Data
It comprises a group of data which cannot be arranged, nor can any type of
relationship be established between them → a computer program is restricted to
access it.

This type of data is very helpful in handling the diverseness of data present in a
group of datasets. But this type of data does not guarantee the security of data.
The process of modification of the dataset is also restricted due to its nature.
This type of data can be handled with the help of extensible markup language -
XML

1. Introduction to Big Data 4

You might also like