Professional Documents
Culture Documents
Agendum:
Big Data
Big dataa growing torrent
4 VS Of Big Data
Big Data vs. DWH-DM
Challenges of Large Scale Social Network Analysis
Where does it come from??
Big data Technologies
Applications of Big data Analysis
Conclusion
Big data
With these caveats, Big data will range from a few dozen
terabytes to multiple petabytes (thousands of terabytes ).
Big dataa growing torrent
$600 to buy a disk drive that can store all of the worlds music
Volume
Volume
Variabilit
Variabilit Big Velocity
y Velocity
y Data
Variety
Variety
Big Data vs. DWH-DM
Big Data
Multitude of data types
Structured, Semi-structured and Unstructured
Demographic, psychographic, transactional
Call center data, social media data, web log
data, sensor networks etc.
Requires new storage mechanisms eg. Hadoop
High dimensionality
Online versions of algorithms
Online services such as eBay, Yahoo, Amazon and
Facebook, have transformed/ created big data
Big Data vs. DWH-DM
Areas like genomics, astronomy, military surveillance and
RFID technology are also contributing to the explosive
growth of the field.
A jet engines sensors sends terabytes of data every hour,
which can be used to build predictive models for repair
cycles. Understanding when repairs should be done, instead
of doing traditional preventive maintenance at certain set
intervals, could be worth billions of dollars.
The challenge in big data analytics is to dig deeply, quickly
and widely
DWH-DM
Structured data
Off-line algorithms
Challenges of Large Scale Social Network
Analysis
Social networking sites like Facebook, YouTube, Orkut and
Twitter are among the most popular sites on the internet.
Users of these sites form a social network (SN), which provides
a powerful mean of sharing, organizing, and finding contents
and contacts.
However, the rate at which SNs are growing, posses many
latent challenges in maintaining the stability of their
underlying systems and the members associated with them.
Challenges of Large Scale Social Network
Analysis
Social Networks (SNs) are living networks that daily give birth
to data traces which can be up to exabytes in volume.
For example, Facebook produce more than a petabyte of data
per day. Even its logging data exceeds 25 terabytes per-day.
Google creates as much information (social blogs and orkut )
in two days now, as we did from the dawn of man through
2003 i.e., one exabyte of data.
Analysts need to analyze this huge plethora of SN data to
support system management activities in limited time.
Big data and Big Brother
Perhaps one of the biggest contributors to big data, however,
is social networking.
People themselves have become contributors of information
as they increasingly use services such as Facebook and
LinkedIn to connect with each other.
LinkedIn is a particularly interesting target, given the
professional nature of its audience. By analyzing LinkedIn
network information, we can learn a lot about individuals and
the people that they know
While it may be difficult to manipulate big data at a grand
scale, it is relatively easy, given the right tools and techniques,
to analyze small subsets (such as personal networks of
contacts) for potentially useful results.
HBase:
Part of the Apache Hadoop project, and modeled on
Googles BigTable.
Suitable for extremely large databases (billions of rows,
millions of columns), distributed across thousands of
nodes. Along with Hadoop, commercial support is
provided by Cloudera.
Prevalence of Big Data
Big data is not limited to big companies like Facebook and
Google.
According to McKinsey Global Institute study in 2011
Most of the investment firms in U.S with less than 1,000
employees has 3.8 petabytes of data stored.
Companies in all sectors have at least 100 terabytes stored.
Big Data And You
Big Data Formats
Big data Technologies
Big data technologies describe a new generation of
technologies and architectures, designed to economically
extract value from very large volumes of a wide variety of
data, by enabling high velocity capture, discovery, and/or
analysis.