You are on page 1of 23

Big Data

Eufris 2012

Why should I care?
McKinsey:
•$250 billions annual savings in EU alone by enhancing public sector •$600 billions annual consumer surplus from using personal location data globally

•Annual growth of data is remarcable •Data is the most valuable thing most companies have •Data is massively underutilized

Eufris 2012

the United States alone could face a shortage of 140.Forecast There will be a shortage of talent necessary for organizations to take advantage of big data. Eufris 2012 .000 to 190.000 people with deep analytical skills as well as 1. By 2018.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

” IDC "Techniques and technologies that make handling data at extreme scale economical.What is Big Data? "Big data technologies describe a new generation of technologies and architectures." Forrester Eufris 2012 . by enabling high-velocity capture. discovery. and/or analysis" IDC "Big Data is a technlogy that helps extract value from the digital universe. designed to economically extract value from very large volumes of a wide variety of data.

 in  real-­‐5me.com Eufris 2012 .ABC of Big Data Analy&cs •making  sense  of  your  data.  managing  and  retaining  large  amounts  of  data www.  in  easy  way Bandwidth •inges5ng.netapp.  prosessing  and  delivering  large  amounts  of  data Content •storing.

 Big  Data  must  be  used  as  it  is  streaming  in  to  the  enterprise  in  order   to  maximize  its  value  to  the  business Volume • Big  Data  comes  in  one  size:  large.  including  unstructured  data  of  all  varie5es:   text.3 V’s of Big Data Variety • Big  Data  extends  beyond  structured  data.  Enterprises  are  awash  with  data.  video.  easily  amassing   terabytes  and  even  petabytes  of  informa5on Eufris 2012 .  audio.  click  streams.  log  files  and  more Velocity • o@en  5me  sensi5ve.

Few core concepts Eufris 2012 .

Hadoop •The  Apache  Hadoop  so.ware  library  is  a  framework  that   allows  for  the  distributed  processing  of  large  data  sets  across   clusters  of  computers  using  a  simple  programming  model. •Three  subprojects •Hadoop  Common •Hadoop  Distributed  Filesystem  (HDFS) •Hadoop  MapReduce Eufris 2012 .

MapReduce •Introduced  by  Google  in  2004 2 2 Map 2 1 2 3 Eufris 2012 Reduce 3 4 5 .

 and  rapidly  changing  new   feature  for  App  Engine Eufris 2012 .  innovaNve.MapReduce on App Engine • Mapreduce  is  an  experimental.

The original intention has been modern web-scale databases. Often more characteristics apply as: schema-free. and more.NoSQL •DefiniNon  1 “Next Generation Databases mostly addressing some of the points: being non-relational. distributed. The movement began early 2009 and is growing rapidly.org Eufris 2012 .” nosql-database. simple API. a huge data amount. easy replication support. eventually consistent. open-source and horizontally scalable.

These data stores may not require fixed table schemas. NoSQL (sometimes expanded to "not only SQL") is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. and typically scale horizontally.” Wikipedia Eufris 2012 .NoSQL •DefiniNon  2 “In computing. usually avoid join operations.

 Consistency.  Durability BASE: Basically  available.  Isola&on.From ACID to BASE ACID: Atomicity.  So?  state.  Eventually  consistent Eufris 2012 .

Big Data and cloud Eufris 2012 .

Big Data on AWS Eufris 2012 .

0.0 Eufris 2012 .MapReduce on AWS • Not  yet  Hadoop  1.

MapReduce on AWS EC2 S3 + DynamoDB Eufris 2012 .

SQL-like query language. and Google Apps Script Eufris 2012 .Analyze billions of rows(!) in seconds • Scale .Terabytes of data. a browser-based graphical interface.Powerful group.and user-based permissions using Google accounts • Security . hosted on Google infrastructure • Sharing .Secure SSL access • Multiple access methods . a command-line tool. trillions of records • Simplicity .Google BigQuery Features • Speed .Can be used by REST API.

BigQuery example Eufris 2012 .

Big Data outside of cloud Eufris 2012 .

• 40 Gb/s InfiniBand connectivity between nodes and engineered systems. • 648 TB of raw disk storage. • 10 Gb/s Ethernet connectivity.Oracle Big Data Appliance About 500 000 $ 18 Oracle Sun Servers • 864 GB main memory. • 216 CPU cores. Eufris 2012 .

. and act on 100 percent of their data. understand. allowing organizations to automatically process.Autonomy IDOL 10 "For far too long. as businesses can develop entirely new applications that explore the richness and color of Human Information that live in unstructured. semi-structured." “IDOL 10 brings these worlds together. The results will be dramatic. organizations have confined structured data to relational databases and unstructured data to simplistic keyword matching technologies.” Price? Eufris 2012 .. and structured forms. in real-time.

Thank you! Eufris 2012 .