You are on page 1of 29

The New BI Ecosystem

:
How Big Data Merges Top Down and Bottom up Computing
Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum

Agenda
• Big data platforms
– Relational databases – Analytical databases – Hadoop

• New analytical ecosystem

2

What comes next? • Kilobyte (KB) • Megabyte (MB) • Gigabyte (GB) • Terabyte (TB) • Petabyte (PB) • Exabyte (EB) • Zettabyte (ZB) • Yottabyte (YB) 3 – 103 bytes –106 bytes – 109 bytes –1012 bytes – 1015 bytes – 1018 bytes – 1021 bytes – 1024 bytes .

What is “big data”? Data Systems Movement a) b) c) d) e) f) g) h) i) Lots of data Different types of data More data than you can handle Purpose-built analytical systems Distributed file system New staging area and archive A Java developer’s employment act A replacement for the RDBMS A club for hip data people Yes! .

May 2009 2005 2006 2007 2008 2009 2010 2011 2012 Every 18 months.Information explosion Unstructured & Content Depot Structured & Replicated Source: IDC Digital Universe 2009. non-rich structured and unstructured enterprise data doubles 5 . Sponsored by EMC. White Paper.

2010 6 . Nov 4.Data deluge • Structured data – Call detail records – Point of sale records – Claims data • Semi-structured data – Web logs – Sensor data – Email. The Economist. Text “A Sea of Sensors”. Twitter • Unstructured data – Video. – Images. Audio.

From transactions to observations Structured  Semi-Structured  Unstructured 7 .

Three big data platforms (systems) • General purpose relational database • Analytical database • Hadoop 8 .

General purpose RDBMS .Powers first generation DW Operational System Benefits: .RDBMS already inhouse .Scalability and performance 9 .Cost to deploy and upgrade .SQL-based .Trained DBAs Operational System ETL Operational System Data Warehouse Data Warehouse ETL Data Mart BI Server Reports / Dashboards Operational System Challenges: .1.Doesn’t support complex analytics .

Netezza) -Hosted(1010data. Deployment Options -Software only (Paraccel. Analytical platforms 1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP) Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Kognitio) .2. Exadata. Vertica) -Appliance (SAP.

Game-changing technology • Quicker to deploy – Preconfigured and tuned – Fast ROI • Faster and more scalable – Faster query response times – Linear performance • Built-in analytics – Libraries of functions – Extensible SDK • Less costly – Less power. space – Fewer people to maintain . cooling.

Business value of analytic platforms • Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations • AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing Analytical appliance Analytical Database .

3. Hadoop •Ecosystem of open source projects •Hosted by Apache Foundation •Google developed and shared concepts •Distributed file system that scales out on commodity servers with direct attached storage and automatic failover. 13 .

Agile .Comprehensive .Expressive .Immature .Expertise .TCO 14 .Batch oriented .Hadoop distilled: What’s new? Unstructured data Benefits Distributed File System Data scientist BIG DATA Open Source $$ MapReduce “Schema at Read” .Affordable Drawbacks No SQL .

Hadoop ecosystem Source: Hortonworks .

Hadoop use cases • Sabre Holdings – Analyze airline shopping data • Vestas – Site wind turbines by modeling larger volumes of weather data • CBS Interactive – Optimize ad placement and pricing • Nokia – Identify new data services 16 .

Hadoop hype Overheard “Hadoop will replace relational databases.” “Hadoop will replace data warehouses.” 17 .” “Hadoop has a superior query engine compared to analytical platforms.” Gartner Group – Hype Cycle “Use Hadoop for any application that requires more than one node.

2012 18 . April. BI Leadership Forum.Hadoop adoption rates No plans 38% 32% 20% 5% Considering Experimenting Implementing In production 4% Based on 158 respondents.

April. 2012 .Hadoop workloads Today In 18 Months 92% 92% 92% 92% 83% 58% 42% 25% 58% 92% Staging area Online archive Transformation Engine Ad hoc queries Scheduled reports Visual exploration Data mining 67% 67% 67% 83% Based on respondents that have implemented 19 Hadoop. BI Leadership Forum.

Which platform do you choose? Hadoop Analytic Database General Purpose RDBMS Structured  Semi-Structured  Unstructured 20 .

Big data platform comparison RDBMS Purpose Volume Variety Access Latency OLTP Low Relational SQL Low Analytical Database Analytics Moderate Relational+ SQL+ Moderate Hadoop Anything High Variable Java+ High Concurrency Cost per GB Role High High DW Hub or data mart 21 Moderate Moderate DW or Sandbox Low Low Staging area and archive .

The New BI Ecosystem 22 .

Hive. Key-value pairs. etc. OLAP. Ad hoc SQL OLAP. graph notation. MapReduce. HDFS.BI Framework 2020 Business Intelligence End-User Tools Reports and Dashboards Design Framework MAD Dashboards Continuous Intelligence Content Intelligence Architecture Keyword search. Streams Event-driven Reporting & Analysis Analytic Analytic Sandboxes Sandboxes Ad hoc query. Hadoop Excel. Access. Visual Analysis. BI tools. visual exploration Analytics Intelligence 23 Exploration Power Users . NoSQL databses Data Warehousing Data Warehousing Dashboard Alerts Event-Driven Alerts and Dashboards Event detection and correlation CEP. XML schema. Spreadsheets. etc. Java. Analytic Workbenches. Xquery. Data mining.

Expensive .Politically charged .Hard to change .Quick to build .Pros: .“Business Intelligence” Corporate Objectives and Strategy Reporting & Monitoring (Casual Users) Data Warehousing Architecture Predefined Metrics Non-volatile Data Reports Beget Analysis Pros: .Alignment .Hard to build .Easy to change -Low cost Cons: .Alignment -Consistency Cons: .“Schema Heavy” BI Framework TOP DOWN.Consistency .Politically uncharged .“Schema Light” Analytics Architecture Analysis Begets Reports Ad hoc queries Volatile Data Analysis and Prediction (Power Users) Processes and Projects 24 .

near real-time. Transform. Load (Batch. or real-time) Streaming/ CEP Engine Casual User Operational System BI Server Machine Data Data Warehouse Hadoop Cluster Dept Data Mart Top-down Architecture Virtual Sandboxes Web Data Bottom-up Architecture Inm em ory Sandbox Audio/video Data FreeStanding Sandbox Analytic platform or nonrelational database External Data Power User Documents & Text .The new analytical ecosystem Operational Systems (Structured data) Operational System Extract.

or real-time) Streaming/ CEP Engine Casual User Operational System BI Server Machine Data Data Warehouse Hadoop Cluster Dept Data Mart Top-down Architecture Bottom-up Architecture Inmemory Sandbox Virtual Sandboxes Web Data Audio/video Data FreeStanding Sandbox Analytic platform or nonrelational database External Data Power User Documents & Text .Analytical sandboxes Operational Systems (Structured data) Operational System Extract. Load (Batch. Transform. near real-time.

transform. load Analytical database (DW) “Capture in case it’s needed” 5. aggregate 27 . Parse. Explore data 9. Extract. Report and mine data Analytical tools 6.Workflows “Capture only what’s needed” Source Systems 1.

Analytical platform.Recommendations • Explore applications for multi-structured data • Apply the right tool for the job – RDBMS. NoSQL • Make power users full-fledged members of your BI environment • Reconcile top-down and bottom-up BI environments  Create an analytical ecosystem! 28 . Hadoop.

com 29 .Questions? • • • • • Analytical thought leader Founder. BI Leadership Forum Director of Research. TechTarget Former director of research at TDWI Author • Wayne Eckerson • weckerson@bileadership.