Professional Documents
Culture Documents
First and foremost, big data can be defined based on its structure. The structure of data depends
on how organizable it is. In other words, whether it can be formatted into tables of rows and
columns. There are three types of big data when defining it by the structure.
1. Structured
2. Unstructured
3. Semi-structured
1. Structured
Any data that can be stored in a database or other data management platform, and it can be
easily accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over
the period of time, talent in computer science have achieved greater success in developing
techniques for working with such kind if data (where the format is well known in advance) and
also deriving value out of it. However, now days, we are foreseeing issues when size of such
data grows to a huge extent, typical sizes are being in the rage of multiple zettabyte. In
accounts for about 20% of total existing and is used the most in programming and computer-
related activities.
There are two sources of structured data-machine-generated and human-generated. All the data
received from sensors, web logs and financial systems are classified under machine-generated
data. These include medical devices, GPS data, and data of usage statistics captured by servers
and applications and the huge amount of data that usually move through trading platforms.
Human-generated structured data mainly includes all the data a human input into a computer,
such as his name and other personal details. When a person clicks a link on the internet or even
makes move in a game, data is created- this can be used by companies to figure out their
customer behavior and make the appropriate decisions and modifications.
2. Unstructured
Any data with unknown form or the structure is classified as unstructured data. Addition to the
size being huge, un- available with structured data poses multiple challenges in terms of its
processing for deriving value out of it and the majority of big data is unstructured, meaning it
can’t easily be organized or classified. Typical example of unstructured data is, a heterogeneous
data source containing a combination of simple text files, images, videos etc. Now a day
organizations have wealth of data available with them but unfortunately they don’t know how to
derive value out of it since this data is in its raw form or unstructured format. About 80% of the
total account for unstructured big data.
Unstructured data is also classified based on its source, into machine-generated or human-
generated. Machine-generated data accounts for all the satellite images, the scientific data form
various experiments and radar data captured by various facets of technology.
Human-generated unstructured data is found in abundance across the internet. Since it includes
social media data, mobile data and website content. This means that the pictures we upload to out
Facebook or Instagram handles, the videos we watch on YouTube and even the text messages we
send all contribute to the gigantic heap that is unstructured data.
3. Semi-structured
The line between unstructured data and semi-structured data has always been unclear. As the
name implies, semi-structured data isn’t inherently organized at the start, since most of the
semi-structured data appear to be unstructured at a glance. Information that is not in traditional
database format as structured data. We can see semi-structured data as a structured in form but
it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-
structured data is a data represented in XML file, JSON.
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>