Professional Documents
Culture Documents
An Overview
Cyrus Lentin
Introduction
▪ Big Data Is An Evolving Term That Describes Any Voluminous Amount Of Structured, Semi-structured
And Unstructured Data That Has The Potential To Be Mined For Information
▪ Big Data Originally Was Characterized By 3vs:
• The Extreme Volume Of Data
• The Wide Variety Of Types Of Data
• The Velocity At Which The Data Is Generated
▪ Then We Talked Of 5vs:
• Veracity Or Uncertainty Of Data
• The Value Of Data
▪ Today We Talk Of 8vs:
• Visualization To Make Sense Of Data At A Glance
• Viscosity Would You Want To Keep The Data With You / Something Useful Or Important
• Virality Is There A Chance That The Data May Go Viral? Can The Data Be Used Further In A Post Etc?
▪ Although Big Data Doesn't Refer To Any Specific Quantity, The Term Is Often Used When Speaking
About Petabytes And Exabytes Of Data. (10^12 Times Size Of Ordinary Files)
▪ Availability Of Data
▪ Increase In Processing Power
▪ Increase In Storage Capabilities
▪ Introduced by Google was GFS (Google File System) and Map Reduce
▪ Then Hadoop became open source covering both HDFS & MR
▪ Hadoop is owned by Apache
▪ Hadoop is used by Facebook, Yahoo, Google, Twitter, LinkedIn, Rackspace
▪ Versions
• OS
• Hadoop
• Distribution
▪ Developed & maintained by
• Apache
▪ Package & distributed by
• Cloudera
• HortonWorks
• MapR
▪ Predictive Analysis
▪ Sentiment Analysis
▪ Customer Intelligence
▪ Fraud & Security Intelligence
▪ High-Performance Analytics
▪ Risk Management
▪ Operational Analysis