You are on page 1of 19

INTRODUCTION TO BIG DATA

What is Big Data


In order to understand 'Big Data', you first need to know

What is Data?

 Data is a raw and unorganized fact that required to be processed to make


it meaningful.
 Generally, data comprises facts, observations, perceptions numbers,
characters, symbols, image, etc.

Qualitative Vs Quantitative

Data can be qualitative or quantitative.

Qualitative data is descriptive information (it describes something)

Quantitative data is numerical information (numbers)


Quantitative data can be Discrete or Continuous:

Discrete data can only take certain values (like whole numbers)

Continuous data can take any value (within a range)

Discrete data is counted, Continuous data is measured


Qualitative:
He is brown and black
He has long hair
He has lots of energy
Quantitative:
Discrete:
He has 4 legs
He has 2 brothers
Continuous:
He weighs 25.5 kg
He is 565 mm tall
Information

 Information is a set of data which is processed in a meaningful way according


to the given requirement.
 Information is processed, structured, or presented in a given context to make
it meaningful and useful.
What is a Database?

# A database is a systematic collection of data.

# They support electronic storage and manipulation of data. Databases make

data management easy.

# Let us discuss a few examples: An online telephone directory uses a database

to store data of people, phone numbers, other contact details.

# Your electricity service provider uses a database to manage billing, client-

related issues, handle fault data, etc.


Data Warehouse

 Data Warehouse is a repository of information collected from multiple

sources, stored under a unified schema, and usually residing at a single site.

 Data warehouses are constructed via a process of data cleaning, data

integration, data transformation, data loading, and periodic data refreshing.

Data Mining

 Extraction of relevant data from the large amount of data is called as data

mining.

 knowledge mining from data, knowledge extraction, data/pattern analysis


Big Data

Big Data is also data but with a huge size.

Big Data is a term used to describe a collection of data that is huge in volume

and yet growing exponentially with time.

Big Data is often described as extremely large data sets that have grown

beyond the ability to manage and analyze them with traditional data

processing tools.

The primary difficulties are the acquisition, storage, searching, sharing,

analytics, and visualization of data.


Examples Of Big Data

Following are some the examples of Big Data

 The New York Stock Exchange generates about one terabyte of new

trade data per day.

Obviously this number ranges, but generally 2–6 billion shares


represents the general daily trading volumes for the NYSE
Social Media

 The statistic shows that 500+terabytes of new data get ingested into the

databases of social media site Facebook, every day.

 This data is mainly generated in terms of photo and video uploads,

message exchanges, putting comments etc.

According to IBM, every day we create 2.5 quintillion (2.5 X10 18) bytes of data,

so much that 90 percent of the data in the world today has been created in

the last two years. These data come from everywhere: sensors used to gather

climate information, posts to social media sites, digital pictures and videos

posted online, transaction records of online purchases, and cell phone GPS

signals, to name just a few.


Types Of Big Data

BigData' could be found in three forms:

Structured

Unstructured

Semi-structured
Structured

 Any data that can be stored, accessed and processed in the form of fixed

format is termed as a 'structured' data.

 Over the period of time, talent in computer science has achieved greater

success in developing techniques for working with such kind of data (where

the format is well known in advance) and also deriving value out of it.

 However, nowadays, we are foreseeing issues when a size of such data

grows to a huge extent, typical sizes are being in the rage of multiple

zettabytes.
Examples Of Structured Data

An 'Employee' table in a database is an example of Structured Data


Unstructured

 Any data with unknown form or the structure is classified as unstructured

data.

 In addition to the size being huge, un-structured data poses multiple

challenges in terms of its processing for deriving value out of it.

 A typical example of unstructured data is a heterogeneous data source

containing a combination of simple text files, images, videos etc.


Examples Of Un-structured Data

The output returned by 'Google Search'


Semi-structured

 Semi-structured data can contain both the forms of data.

 We can see semi-structured data as a structured in form but it is actually not

defined with e.g. a table definition in relational DBMS.

 Example of semi-structured data is a data represented in an XML file.

Examples Of Semi-structured Data


Personal data stored in an XML file-
Characteristics Of Big Data

Volume

Variety

Veracity

Velocity
1. Volume. Big Data comes in one size: large. Enterprises are wast with data, easily

amassing terabytes and even petabytes of information.

2. Variety. Big Data extends beyond structured data to include unstructured data of

all varieties: text, audio, video, click streams, log files, and more.

3. Veracity. The massive amounts of data collected for Big Data purposes can lead

to statistical errors and misinterpretation of the collected information. Purity of the

information is critical for value.

4. Velocity. Often time sensitive, Big Data must be used as it is streaming into the

enterprise in order to maximize its value to the business, but it must also still be

available from the archival sources as well.

You might also like