Data Types

Big Data Types
Types of Data
Types of Data
Big Data Types
Structured Data
• structured data is comprised of clearly defined data types whose pattern
makes them easily searchable;
• structured data (the kind that is easy to define, store, and analyze)
• Structured data analytics is a mature process and technology.
• Structured data usually resides in relational databases (RDBMS).
• Fields store length-delineated data phone numbers, Social Security
numbers, or ZIP codes
Structured Data
• Even text strings of variable length like names are contained in records,
making it a simple matter to search.
• Data may be human- or machine-generated as long as the data is created
within an RDBMS structure.
• This format is eminently searchable both with human generated queries
and via algorithms using type of data and field names, such as
alphabetical or numeric, currency or date
Structured Data
• Common relational database applications with structured data include
airline reservation systems, inventory control, sales transactions, and ATM
activity. Structured Query Language (SQL) enables queries on this type of
structured data within relational databases.
Unstructured Data
• Unstructured data is essentially everything else. Unstructured data has
internal structure but is not structured via pre-defined data models or
schema.
• It may be textual or non-textual, and human- or machine-generated. It
may also be stored within a non-relational database like NoSQL.
Unstructured Data
Unstructured Data
Unstructured Data
• On top of this, there is simply much more unstructured data than
structured. Unstructured data makes up 80% and more of enterprise data,
and is growing at the rate of 55% and 65% per year. And without the tools
to analyze this massive data, organizations are leaving vast amounts of
valuable data on the business intelligence table.
Unstructured Data
Unstructured data (the kind that tends to defy easy
definition, takes up lots of storage capacity, and is
typically more difficult to analyze).
Unstructured data is basically information that either
does not have a predefined data model and/or does
not fit well into a relational database.
Unstructured information is typically text heavy, but may
contain data such as dates, numbers, and facts as well.
Unstructured Data
 but here are the main takeaways that we would like to share with you:
 The amount of data (all data, everywhere) is doubling every two years.
 Our world is becoming more transparent. We, in turn, are beginning to accept
this as we become more comfortable with parting with data that we used to
consider sacred and private.
 Most new data is unstructured. Specifically, unstructured data represents almost
95 percent of new data, while structured data represents only 5 percent.
 Unstructured data tends to grow exponentially, unlike structured data, which
tends to grow in a more linear fashion.
 Unstructured data is vastly underutilized. Imagine huge deposits of oil or other
natural resources that are just sitting there, waiting to be used. That ’s the
current state of unstructured data as of today. Tomorrow will be a different
Semi Structured Data
The term semi-structured data is used to

describe structured data that doesn't ’t fit into a
formal structure of data models.
However, semi-structured data does contain tags
that separate semantic elements, which
includes the capability to enforce hierarchies
within the data.s
• Semi-structured data maintains internal tags and markings that identify
separate data elements, which enables information grouping and
hierarchies. Both documents and d
• Email is a very common example of a semi-structured data type.atabases
can be semi-structured.
General obstacles for with Big Data.
• Unstructured data formats

• fast moving (streaming) data
• multi-source data input
• noisy and poor-quality data
• high dimensionality
• scalability of algorithms
• Unlabelled data
• designing flexible and highly scalable architectures
• understanding statistical data characteristics before applying algorithms;
• developing ability to work with larger datasets
Data Type
• Big Data includes huge volume, high velocity, and extensible variety of data.
These are 3 types: Structured data, Semi-structured data, and Unstructured data.
• Structured data –
Structured data is data whose elements are addressable for effective analysis. It
has been organized into a formatted repository that is typically a database. It
concerns all data which can be stored in database SQL in a table with rows and
columns. They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and simplest
way to manage information. Example: Relational data.
Semi-Structured data –
•
Semi-structured data is information that does not reside in a relational
database but that have some organizational properties that make it easier
to analyze. With some process, you can store them in the relation database
(it could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML data.
Semi-structured data
• Semi-structured data is the data which does not conform to a data model
but has some structure. It lacks a fixed or rigid schema. It is the data that
does not reside in a rational database but that have some organisational
properties that make it easier to analyse. With some process, we can store
them in the relational database.
Characteristics of semi-structured Data:
• Data does not conforms to a data model but has some structure.
• Data can not be stored in the form of rows and columns as in Databases
• Semi-structured data contains tags and elements (Metadata) which is used to group data and
describe how the data is stored
• Similar entities are grouped together and organised in a hierarchy
• Entities in the same group may or may not have the same attributes or properties
• Does not contains sufficient metadata which makes automation and management of data difficult
• Size and type of the same attributes in a group may differ
• Due to lack of a well defined structure, it can not used by computer programs easily
Using LOBs for Semi structured Data
• Document files such as XML documents or word processor files are
examples of semi-structured data. These types of documents contain data
in a logical structure that is interpreted or processed by an application, and
it is not broken down into smaller logical units when stored in the
database.
Unstructured data –
•
Unstructured data is a data which is not organized in a predefined manner
or does not have a predefined data model, thus it is not a good fit for a
mainstream relational database. So for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent
in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media
logs.
Unstructured data
• Unstructured data is the data which does not conform to a data model
and has no easily identifiable structure such that it can not be used by a
computer program easily. Unstructured data is not organised in a pre-
defined manner or does not have a pre-defined data model, thus it is not a
good fit for a mainstream relational database.
Characteristics of Unstructured Data:
• Data neither conforms to a data model nor has any structure.
• Data can not be stored in the form of rows and columns as in Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer programs
easily
Using LOBs for Unstructured Data
s
s
OverAll

Data Types

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Types

Uploaded by

Copyright:

Available Formats

Big Data Types

The term semi-structured data is used to

• Unstructured data formats

You might also like