You are on page 1of 6

● What is Big Data ?

Big data is data that contains greater variety arriving in increasing volumes and with
ever-higher velocity. This is known as the three Vs.

Put simply, big data is larger, more complex data sets, especially from new data
sources. These data sets are so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data can be used to address
business problems you wouldn’t have been able to tackle before.

● The Three V’s of Big Data :

1. Volume :​ The amount of data matters. With big data, you’ll have to process high
volumes of low-density, unstructured data. This can be data of unknown value,
such as Twitter data feeds, clickstreams on a webpage or a mobile app, or
sensor-enabled equipment. For some organizations, this might be tens of
terabytes of data. For others, it may be hundreds of petabytes.

2. Velocity :​ Velocity is the fast rate at which data is received and (perhaps) acted
on. Normally, the highest velocity of data streams directly into memory versus
being written to disk. Some internet-enabled smart products operate in real time
or near real time and will require real-time evaluation and action.

3. Variety :​ Variety refers to the many types of data that are available. Traditional
data types were structured and fit neatly in a relational database. With the rise of
big data, data comes in new unstructured data types. Unstructured and
semistructured data types, such as text, audio, and video, require additional
preprocessing to derive meaning and support metadata.

● Examples of Big Data :

1. Marketing :​ Marketers have targeted ads since well before the internet—they
just did it with minimal data, guessing at what consumers might like based on
their TV and radio consumption, their responses to mail-in surveys and insights
from unfocused one-on-one "depth" interviews.

2. Transportation : ​Maps to apps. That's the nutshell version of how navigation


has been transformed by technology, with the vast majority of smartphone users
relying on their devices for directions. And those directions are courtesy of big
data — relevant information (on traffic patterns, for example) gleaned from
government agencies, satellite images and other sources.
3. Business Insights : ​Most companies collect far more data than they analyze. By
one estimate, between 60 and 73 percent of it sits unused. There are several
possible reasons that's the case, one being that many analytics tools analyze
only small randomized samples of massive data sets. Doing so speeds up the
discovery process but leaves a lot of data untapped. With other companies, it's
just a matter of determining the value of voluminous data — what it can actually
do for them.

4. HealthCare : ​That reality is slowly but surely changing the healthcare landscape.
Diagnostics are migrating from clinics to wearables like the Apple Watch's built-in
heart rate monitor and Alphabet's in-development skin temperature monitor.
Unlike doctors, such devices can collect time biometric data over the long term
instead of just during appointments.
● Big Data Challenges :
While big data holds a lot of promise, it is not without its challenges.

First, big data is…big. Although new technologies have been developed for data
storage, data volumes are doubling in size about every two years. Organizations still
struggle to keep pace with their data and find ways to effectively store it.

But it’s not enough to just store the data. Data must be used to be valuable and that
depends on curation. Clean data, or data that’s relevant to the client and organized in a
way that enables meaningful analysis, requires a lot of work. Data scientists spend 50 to
80 percent of their time curating and preparing data before it can actually be used.

Finally, big data technology is changing at a rapid pace. A few years ago, Apache
Hadoop was the popular technology used to handle big data. Then Apache Spark was
introduced in 2014. Today, a combination of the two frameworks appears to be the best
approach. Keeping up with big data technology is an ongoing challenge.

● How Big Data Works ?

1. Integrate :
Big data brings together data from many disparate sources and applications.
Traditional data integration mechanisms, such as ETL (extract, transform, and
load) generally aren’t up to the task. It requires new strategies and technologies
to analyze big data sets at terabyte, or even petabyte, scale.During integration,
you need to bring in the data, process it, and make sure it’s formatted and
available in a form that your business analysts can get started with.

2. Manage :
Big data requires storage. Your storage solution can be in the cloud, on
premises, or both. You can store your data in any form you want and bring your
desired processing requirements and necessary process engines to those data
sets on an on-demand basis. Many people choose their storage solution
according to where their data is currently residing. The cloud is gradually gaining
popularity because it supports your current compute requirements and enables
you to spin up resources as needed.

3. Analyze :
Your investment in big data pays off when you analyze and act on your data. Get
new clarity with a visual analysis of your varied data sets. Explore the data further
to make new discoveries. Share your findings with others. Build data models with
machine learning and artificial intelligence. Put your data to work.
● What is Hadoop ?

Hadoop is an open-source software framework for storing data and running applications
on clusters of commodity hardware. It provides massive storage for any kind of data,
enormous processing power and the ability to handle virtually limitless concurrent tasks
or jobs.

● Why is Hadoop important ?


1. Ability to store and process huge amounts of any kind of data, quickly. With data
volumes and varieties constantly increasing, especially from social media and the
Internet of Things (IoT), that's a key consideration.
2. Computing power. Hadoop's distributed computing model processes big
data fast. The more computing nodes you use, the more processing power
you have.
3. Fault tolerance. Data and application processing are protected against
hardware failure. If a node goes down, jobs are automatically redirected to
other nodes to make sure the distributed computing does not fail. Multiple
copies of all data are stored automatically.
4. Flexibility. Unlike traditional relational databases, you don’t have to
preprocess data before storing it. You can store as much data as you want
and decide how to use it later. That includes unstructured data like text,
images and videos.
5. Low cost. The open-source framework is free and uses commodity
hardware to store large quantities of data.
6. Scalability. You can easily grow your system to handle more data simply
by adding nodes. Little administration is required.

You might also like