Big Data Content

What is Big Data
Big data is a combination of structured, semi-structured, and unstructured data collected by

organizations that can be mined for information and used in machine learning projects, predictive
modeling, and other advanced analytics applications.
Systems that process and store big data have become a common component of data
management architectures in organizations, combined with tools that support big data
analytics uses. Big data is often characterized by the three V's:
● the large volume of data in many environments;

● the wide variety of data types frequently stored in big data systems; and
● the velocity at which much of the data is generated, collected, and processed.
These characteristics were first identified in 2001 by Doug Laney, then an analyst at consulting
firm Meta Group Inc.; Gartner further popularized them after it acquired Meta Group in 2005.
More recently, several other V's have been added to different descriptions of big data,
including veracity, Visualization, value, and variability.
Although big data doesn't equate to any specific volume of data, big data deployments often
involve terabytes, petabytes, and even Exabytes of data created and collected over time.
Understanding the 7 V’s of Big Data:
How do you define big data? The seven V’s sum it up pretty well
– Volume, Velocity, Variety, Variability, Veracity, Visualization, and Value.
The “Big” in Big Data distinguishes data sets of such grand scale that traditional database
systems are not up to the task of adequately processing the information. However, there is more
to what makes Big Data big than simply its scale. Doug Laney, an analyst for Gartner, once
described Big Data as consisting of the three dimensions of high volume, high velocity and high
variety, but there are other “Vs” that help comprehend the Big Data’s true nature and its
implications.
Volume
When discussing Big Data volumes, almost unimaginable sizes and unfamiliar numerical terms
are required:
● Each day, the world produces 2.5 quintillion bytes of data. That is 2.3 trillion gigabytes.
● By 2020, we will have created 40 zettabytes of data, which is 43 trillion gigabytes.
● Most companies already have, on average, 100 terabytes of data stored each.
● Facebook users upload that many data daily.
● Walmart alone processes over a million transactions per hour.
Velocity
Underlying the volume numbers is an even larger trend, which is that 90 percent of extant data
have been created in just the last two years. The speed at which data are generated, accumulated
and analyzed is on a steep acceleration curve. As of next year, there will be 19 billion network
connections globally feeding this velocity.
Although most data are warehoused before analysis, there is an increasing need for real-time
processing of these enormous volumes, such as the 200 million emails, 300,000 tweets and 100
hours of Youtube videos that are passing by every minute of the day. Real-time processing
reduces storage requirements while providing more responsive, accurate, and profitable
responses.
Variety
Another challenge of Big Data processing goes beyond the massive volumes and increasing
velocities of data but also in manipulating the enormous variety of these data. Taken as a whole,
these data appear as an indecipherable mass without structure. Consisting of natural language,
hashtags, geospatial data, multimedia, sensor events and so much more, the extraction of
meaning from such diversity requires ever-increasing algorithmic and computational power.
Variability
Furthermore, the intrinsic meanings and interpretations of these conglomerations of raw data
depending on their context. This is especially true with natural language processing. A single
word may have multiple meanings. New meanings are created and old meanings are discarded
over time. Interpreting connotations are, for instance, essential to gauging and responding to
social media buzz. The boundless variability of Big Data, therefore, presents a unique decoding
challenge if one is to take advantage of its full value.
Veracity
Understanding what Big Data is telling you is one thing. However, it is useless if the data being
analyzed are inaccurate or incomplete. This situation arises when data streams originate from
diverse sources presenting a variety of formats with varying signal-to-noise ratios. By the time
these data arrive at a Big Data analysis stage, they may be rife with accumulated errors that are
difficult to sort out. It almost goes without saying that the veracity of the final analysis is
degraded without first cleaning up the data it works with.
Visualization
A core task for any Big Data processing system is to transform the immense scale of it into
something easily comprehended and actionable. For human consumption, one of the best
methods for this is converting it into graphical formats. Spreadsheets and even three-dimensional
visualizations are often not up to the task, however, due to the attributes of velocity and variety.
There may be a multitude of spatial and temporal parameters and relationships between them to
condense into visual forms.
Value
No one doubts that Big Data offers an enormous source of value to those who can deal with its
scale and unlock the knowledge within. Not only does Big Data offer new, more effective
methods of selling but also vital clues to new products to meet previously undetected market
demands. Many industries utilize Big Data in the quest for cost reductions for their organizations
and their customers. Those who offer the tools and machines to handle Big Data, its analysis, and
visualization also benefit hugely, albeit indirectly.
Although Volume, Velocity, and Variety are intrinsic to Big Data itself, the other Vs of
Variability, Veracity, Value, and Visualization are important attributes that reflect the gigantic
complexity that Big Data presents to those who would process, analyze and benefit from it. All
of them demand careful consideration, especially for enterprises not already on the Big Data
bandwagon. These businesses may find that their current best practices related to data handling
will require thorough revamping in order to stay ahead of the seven Vs.
Why is big data important?
Companies use big data in their systems to improve operations, provide better customer service,
create personalized marketing campaigns and take other actions that, ultimately, can increase
revenue and profits. Businesses that use it effectively hold a potential competitive advantage
over those that don't because they're able to make faster and more informed business decisions.
1) streamline resource management,
2) improve operational efficiencies,
3) optimize product development,
4) drive new revenue and growth opportunities and
5) enable smart decision-making.
For example, big data provides valuable insights into customers that companies can use to refine
their marketing, advertising, and promotions in order to increase customer engagement and
conversion rates. Both historical and real-time data can be analyzed to assess the evolving
preferences of consumers or corporate buyers, enabling businesses to become more responsive to
customer wants and needs. When you combine big data with high-performance analytics, you
can accomplish business-related tasks such as:
● Determining root causes of failures, issues and defects in near-real time.
● Spotting anomalies faster and more accurately than the human eye.
● Improving patient outcomes by rapidly converting medical image data into insights.
● Recalculating entire risk portfolios in minutes.
● Sharpening deep learning models' ability to accurately classify and react to changing variables.
● Detecting fraudulent behavior before it affects your organization.
These are some of the business benefits organizations can get by using big data.
How big data works
Big data gives you new insights that open up new opportunities and business models. Getting
started involves three key actions:
1. Integrate
Big data brings together data from many disparate sources and applications. Traditional data
integration mechanisms, such as extract, transform, and load (ETL) generally aren’t up to the
task. It requires new strategies and technologies to analyze big data sets at terabyte, or even
petabyte, scale.
During integration, you need to bring in the data, process it, and make sure it’s formatted and
available in a form that your business analysts can get started with.
2. Manage
Big data requires storage. Your storage solution can be in the cloud, on-premises, or both. You
can store your data in any form you want and bring your desired processing requirements and
necessary process engines to those data sets on an on-demand basis. Many people choose their
storage solution according to where their data is currently residing. The cloud is gradually
gaining popularity because it supports your current computing requirements and enables you to
spin up resources as needed.
3. Analyze
Your investment in big data pays off when you analyze and act on your data. Get new clarity
with a visual analysis of your varied data sets. Explore the data further to make new discoveries.
Share your findings with others. Build data models with machine learning and artificial
intelligence. Put your data to work.
Big data challenges
In connection with the processing capacity issues, designing a big data architecture is a common
challenge for users. Big data systems must be tailored to an organization's particular needs, a
DIY undertaking that requires IT and data management teams to piece together a customized set
of technologies and tools. Deploying and managing big data systems also require new skills
compared to the ones that database administrators and developers focused on relational software
typically possess.
Both of those issues can be eased by using a managed cloud service, but IT managers need to
keep a close eye on cloud usage to make sure costs don't get out of hand. Also, migrating
on-premises data sets and processing workloads to the cloud is often a complex process.
Other challenges in managing big data systems include making the data accessible to data
scientists and analysts, especially in distributed environments that include a mix of different
platforms and data stores. To help analysts find relevant data, data management and analytics
teams are increasingly building data catalogs that incorporate metadata management and data
lineage functions. The process of integrating sets of big data is often also complicated,
particularly when data variety and velocity are factors.
References:
1. https://bigdatapath.wordpress.com/2019/11/13/understanding-the-7-vs-of-big-data/
2. https://www.sas.com/en_in/insights/big-data/what-is-big-data.html
3. https://www.techtarget.com/searchdatamanagement/definition/big-data
4. https://www.oracle.com/in/big-data/what-is-big-data/#how

Big Data Content

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Content

Uploaded by

Copyright:

Available Formats

What is Big Data

Big data is a combination of structured, semi-structured, and unstructured data collected by

● the large volume of data in many environments;

Why is big data important?

1) streamline resource management,

2) improve operational efficiencies,

3) optimize product development,

4) drive new revenue and growth opportunities and

5) enable smart decision-making.

● Determining root causes of failures, issues and defects in near-real time.

● Recalculating entire risk portfolios in minutes.

● Detecting fraudulent behavior before it affects your organization.

How big data works

Big data challenges

You might also like