You are on page 1of 34

Types of Digital Data

Types of Digital Data

 Definition
 Sources of Digital Data
 Types of Digital Data
 Structured Data
 Semi-structured Data
 Unstructured Data
Definition and Meaning
of
Digital Data
Definition of Digital Data
Digital describes electronic technology that generates,
stores, and processes data in terms of two states: positive
and non-positive. Positive is expressed or represented by
the number 1 and non-positive by the number 0.

Digital data is information stored on a computer system as a


series of 0's and 1's in a binary language. All data in the
computer is in digital form.

Digital data is data that represents other forms of data


using specific machine language systems that can be
interpreted by various technologies.
Meaning of Digital Data
 Digital means recording or storing information as series of
the numbers 1s and 0s.
 The most fundamental of these systems is a binary system,
which simply stores complex audio, video or text
information in a series of binary characters, traditionally
ones and zeros.
 Digital data is a binary language.
 When you press a key on the keyboard, an electrical circuit
is closed.
 The circuit acts like a switch and has only two possible
options: open or closed.
Meaning of Digital Data
 If you know Morse code, the idea is the same.
 A string of dashes and dots represents one letter or
number. This is binary.
 There is no halfway or in-between.
 The status of the switch as open or closed is interpreted by
the computer as a 0 or 1.
 Each digit is known as a bit.
 Computer disks and drives store this information as lines
of 0's and 1's.
 A byte is composed of eight bits
Sources of Digital Data
Sources of Digital Data
 A data source, in the context of computer science and
computer applications, is the location where data that is
being used come from.
 The data source can be a database, a dataset, or a
spreadsheet.
 Data, as we know, is massive and exists in various forms.
 If it is not classified or sourced well, it can end up wasting
precious time and resources.
 It is important that companies have the know-how
between the various data sources available and
accordingly classify its usability and relevance.
Sources of Digital Data
 Following are few important sources of Digital Data for
most of the industries:
1. Internal Transactional Data
2. IoT as a big Source
3. Syndicated Data
4. Trading Partners Data
5. Open Data
6. Media as a big Data Source
Sources of Digital Data
1. Internal Transactional Data
 Transactional data: data relating to the day-to-day
transactions.
 Specially, transactions happens internally (inside the
organizational boundary) are called as Internal
Transactional Data.
 Internal data, and especially data which is the easiest
because you don't have to negotiate a formal contract
with a third party.
 Its also important to secure internal data from not only
outer world but also from internal ( by giving right of
access) boundary also.
 Often, these data are playing vital role in decision making.
Sources of Digital Data
1. Internal Transactional Data
 Some of the examples of internal transactional data can be
purchases, returns, sales, inventory, orders, invoices,
payments, etc…
 It helps in taking most of routine decisions in the context
of business surely after analyzing only.
 Make sure the proper processes are in place, and follow
very closely organization and staff movements to inform
new stakeholders of why your access to data must remain
safe.
 Though it is easiest source but valuable to maintain
secrecy and utility of it for future purpose.
Sources of Digital Data
2. IoT as a Big Data Source
 Machine-generated content or data created from IoT
constitute a valuable source of big data.
 This data is usually generated from the sensors that are
connected to electronic devices.
 The sourcing capacity depends on the ability of the
sensors to provide real-time accurate information.
 IoT is now gaining momentum and includes big data
generated, not only from computers and smartphones, but
also possibly from every device that can emit data.
 With IoT, data can now be sourced from medical devices,
vehicular processes, video games, meters, cameras,
household appliances, and the like.
Sources of Digital Data
3. Syndicated Data
 A syndicated service is a research study which is
conducted and funded by a market research firm but not
for any specific client is called a syndicated research.
 The result of such research is often provided in the form of
reports, presentations, raw data
 Syndicated data is usually the easiest to control.
 Because you are paying a service provider to deliver data
to you, you have a contract with this provider.
 However, you still need to consider what will happen if the
service provider goes out of business, or changes its
business model.
 Companies have to utilizing data well as its not at all free.
Sources of Digital Data
4. Trading Partners Data
 The case of trading partners data is very similar to the one
of syndicated data, except that the data is usually not
provided as a standalone service but as part of a broader
relationship -- for example between a retailer and a
manufacturer.
 Companies have to develop processes to collect data from
their channel partners.
 Trade partners can give valuable contribution by supplying
data pertaining to insights of consumer.
 Often trade partners are not motivated to do so, company
needs to develop approaches to motivate them to
consistently supply the same.
Sources of Digital Data
5. Open Data
 The good news with open data is that it's free -- but it's
also the bad news.
 Assuming you study carefully the terms of use and
licensing agreement for the data, you should be safe
legally.
 But there is no guarantee that this service will be provided
in the long run, or that it will be provided consistently.
 The risks of the access methods provided, is very high.
 And if the service is not responding, you have no recourse.
 Find multiple sources, and do not build your business on
the assumption that open data feeds will remain available
in the long run.
Sources of Digital Data
6. Media as a Big Data Source
 Media is the most popular source of big data, as it
provides valuable insights on consumer preferences and
changing trends.
 It is the fastest way for businesses to get an in-depth
overview of their target audience, draw patterns and
conclusions, and enhance their decision-making.
 Media includes social media and interactive platforms, like
Google, Facebook, Twitter, YouTube, Instagram, as well as
generic media like images, videos, audios, and podcasts
that provide quantitative and qualitative insights on every
aspect of user interaction.
Types of Digital Data
- Structured Data
- Unstructured Data
- Semi structured Data
Structured Data
1. Definition
 The data which is in an organized form (e.g. in rows and
columns) and can be used by a computer program are called as
“Structured Data”.
 Structured data exists in a format created to be captured,
stored, organized and analyzed.
 Structured data are data that are organized in a format easily
used by a database or other technology.
 The term structured data generally refers to data that has
a defined length and format for big data.
 Structured data is data that has been organized into a
formatted repository (a central location in which data is stored
and managed), typically a database, so that its elements can be
made addressable for more effective processing and analysis.
Structured Data
2. Sources of Structured Data

Databases (e.g., Access)

Spreadsheets

Structured Data
SQL

OLTP systems

SQL stands for Structured Query Language. SQL lets you access and manipulate databases.
Structured Data
3. Storage of Structured Data

Relational Database

Data Warehouse
Structured Data

Spreadsheet
Structured Data
Example of Structured Data
Structured Data
4. Characteristics of Structured Data

Conforms to a
data model
Data is stored in
form of rows and
Similar entities columns
are grouped (e.g., relational
database)

Structured
data

Attributes in a Data resides in


group are the fixed fields within
same a record or file

Definition, format
& meaning of data
is explicitly
known
Summary of Structured Data
Unstructured Data
1. Definition
 Unstructured data is information that either does not have a
pre-defined data model or is not organized in a pre-defined
manner.
 Unstructured data represents any data that does not have a
recognizable structure.
 Unstructured data, in contrast, refers to data that doesn't fit
neatly into the traditional row and column structure of
relational databases.
 Data which does not conform to a data model or is not in a
form which can be used easily by a computer program.
 E.g. memos, chat rooms, PowerPoint presentations, images,
audios, videos, letters, researches, white papers, body of an
e-mail etc.
Formats of Digital Data
Unstructured Data
2. Sources of Unstructured Data
Web pages

Memos

Videos (MPEG, etc.)

Images (JPEG, GIF, etc.)

Body of an e-mail

Unstructured data Word document

PowerPoint presentations

Chats

Reports

Whitepapers

Surveys
Unstructured Data
3. Challenges in Storage of Unstructured Data
Sheer volume of unstructured data and its unprecedented
growth makes it difficult to store. Audios, videos, images,
Storage Space etc. acquire huge amount of storage space

Scalability becomes an issue with increase


Scalability in unstructured data

Retrieving and recovering unstructured


Retrieve data are cumbersome
information
Challenges faced
Ensuring security is difficult due to varied
Security sources of data (e.g. e-mail, web pages)

Update and delete Updating, deleting, etc. are not easy due to
the unstructured form

Indexing and Indexing becomes difficult with increase in data.


searching Searching is difficult for non-text data
Unstructured Data
3. Solution for Storage of Unstructured Data
Unstructured data may be be converted to formats
Change which are easily managed, stored and searched. For
formats example, IBM is working on providing a solution
which converts audio , video, etc. to text

Create hardware which support


New hardware unstructured data either compliment the
existing storage devices or be a stand
alone for unstructured data

Possible solutions
RDBMS/ Store in relational databases which
BLOBs support BLOBs which is Binary
Large Objects

XML Store in XML which tries to give some


structure to unstructured data by using tags
and elements
CAS
Organize files based on their metadata
Unstructured Data
3. Solution for Storage of Unstructured Data
 A Binary Large Object (BLOBs) is a collection of binary
data stored as a single entity in a database
management system. Blobs are typically images, audio
or other multimedia objects, though sometimes
binary executable code is stored as a blob.
 Extensible Markup Language (XML) is a markup
language that defines a set of rules for encoding
documents in a format that is both human-readable
and machine-readable.
 Content-addressable storage (CAS) is a way
of storing information that can be retrieved based on
its content, instead of its storage location. It is used
extensively to store e-mails.
Unstructured Data
4. Characteristics of Unstructured Data

Does not
conform to any
data model
Cannot be
stored in form
Has no easily of rows and
identifiable columns as in a
structure database

Unstructured
data

Not in any
Does not particular
follow any format or
rules sequence
Not easily
usable by a
program
Semi-structured Data
1. Definition
 Data which does not conform to a data model but has
some structure. It is not in a form which can be used easily
by a computer program.
 It is structured data, but it is not organized in a rational
model, like a table.
 Semi-structured data is information that does not reside in
a rational database but that have some organizational
properties that make it easier to analyze.
 With some process, you can store them in the relation
database.
Semi-structured Data
2. Sources of Semi-structured
E-mail

XML

TCP/IP packets

Semi-structured Zipped files


data
Binary executables

Mark-up languages

Integration of data from


heterogeneous sources
Semi-structured Data
3. Storage of Semi-structured
Graph-based data
Schemas XML
models
• Describe the • In computing, • Markup
structure and a graph language XML This
content of data to database (GDB) is is a semi-
some extent a database that structured document
uses graph structur language.
• Assign meaning to es for representing
data hence and storing data.
allowing automatic
search and • Used for data
indexing exchange among
heterogeneous
sources
Semi-structured Data
4. Characteristics of Semi-structured Data
Does not
conform to a
data model but
contains tags &
elements
(metadata) Cannot be
stored in form
Similar entities
of rows and
are grouped
columns as in a
database
Semi-
structured
data

Attributes in a The tags and


group may not elements
be the same describe how
data is stored

Not sufficient
Metadata

You might also like