You are on page 1of 15

UNSTRUCTURED DATA ANALYTICS SYSTEMS

MODULE V
INTRODUCTION

 Unstructured data is the data which does not conforms to a data model and has no easily identifiable

structure such that it can not be used by a computer program easily.

 Unstructured data is not organized in a pre-defined manner or does not have a pre-defined data model,

thus it is not a good fit for a mainstream relational database. 


CHARACTERISTICS OF UNSTRUCTURED DATA

 Data neither conforms to a data model nor has any structure.


 Data can not be stored in the form of rows and columns as in Databases
 Data does not follows any semantic or rules
 Data lacks any particular format or sequence
 Data has no easily identifiable structure
 Due to lack of identifiable structure, it can not used by computer programs easily
SOURCES OF UNSTRUCTURED DATA

 Web pages
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys
PROBLEMS FACED IN STORING UNSTRUCTURED DATA

 It requires a lot of storage space to store unstructured data.


 It is difficult to store videos, images, audios, etc.
 Due to unclear structure, operations like update, delete and search is very
difficult.
 Storage cost is high as compared to structured data
 Indexing the unstructured data is difficult
POSSIBLE SOLUTION FOR STORING UNSTRUCTURED DATA

 Unstructured data can be converted to easily manageable formats


 using Content addressable storage system (CAS) to store unstructured
data. 
It stores data based on their metadata and a unique name is assigned to
every object stored in it.The object is retrieved based on content not its
location.
 Unstructured data can be stored in XML format.
 Unstructured data can be stored in RDBMS which supports BLOBs
NOSQL DATABASES

 NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than

relational tables.

 NoSQL databases come in a variety of types based on their data model.

 The main types are document, key-value, wide-column, and graph.

 They provide flexible schemas and scale easily with large amounts of data and high user loads.
TYPES OF NOSQL DATABASES

 Document databases store data in documents similar to JSON (JavaScript Object Notation) objects.

Each document contains pairs of fields and values. The values can typically be a variety of types
including things like strings, numbers, booleans, arrays, or objects.

 Key-value databases are a simpler type of database where each item contains keys and values.

 Wide-column stores store data in tables, rows, and dynamic columns.

 Graph databases store data in nodes and edges. Nodes typically store information about people,

places, and things, while edges store information about the relationships between the nodes.
BIG DATA

• Big Data is a collection of data that is huge in volume, yet growing exponentially with time.

• It is a data with so large size and complexity that none of traditional data management tools can store it or process
it efficiently. Big data is also a data but with huge size.

• The first organizations to work with big data are: Google, eBay, Facebook, LinkedIn
CONTD..

Volume (Data Quantity)

 Volume is the base Big Data is built on. Each day a gigantic amount of data is being produced by all sorts of sources. Tec jury claims

that 2.5 quintillion bytes of data is created worldwide every day. That’s a lot, though most of this data will never be processed.

Variety (Data Types)

 This term relates to the diversity of data types and sources. Data comes from web pages, search engines, social media, data sensor

systems, and it’s all raw, semi-structured or unstructured. In many ways it is a struggle for enterprises to turn this data mess into a

coherent flow of information


CONTD..

Velocity (Data Speed)

 Velocity refers to the enormous data generation, analysis and reprocess speed. Nowadays data spawns in a blink of

an eye, and is hard for most companies to process.


WHAT IS ANALYTICS?

 Analytics is the process of discovering, interpreting, and communicating significant patterns in data.

 Quite simply, analytics helps us see insights and meaningful data that we might not otherwise detect.

 Analytics uses data and math to answer business questions, discover relationships, predict unknown outcomes and

automate decisions. 
INTRODUCTION TO BIG DATA ANALYTICS

 Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown

correlations, market trends, and customer preferences.

 Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent

activities, among other things.


IMPORTANCE OF BIG DATA ANALYTICS

 Big data analytics helps organizations

harness their data and use it to identify


new opportunities.

 That, in turn, leads to smarter business

moves, more efficient operations, higher


profits and happier customers.

You might also like