You are on page 1of 10

A CASE STUDY: Excel 2010 – A Powerful BI Tool for Analyzing Big Datasets

Kashan Jafri Richard Mintz Evan Ross Marc Wright

© 2012 Dimensional Strategies Inc.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Copyright This document is provided “as-is”. Some examples depicted here are provided for illustration purposes only and bare no real association or connection to any known entities or organizations and no such inference is intended nor should be supposed. may change without notice. including URLs and other Internet Web site references. Information and views expressed throughout. You may copy and use this document for your internal. This document does not provide you with any legal rights to any intellectual property. . You bear the risk of using the content. All rights reserved. reference purposes.

000. humans and machines are generating more than 2. Our considerations focus on approaches to relatively highvolume data needs — and does not specifically address high-velocity. This is what people mean when they talk about “Big Data”.000.000 bytes!) of data every 24 hours.000. however the average speed at which we are now creating new and significant data holdings is astonishingly fast. business and financial transaction records etc. There is nothing new about the data itself. general purpose Business Intelligence solution using an I/O balanced approach to symmetric multiprocessing (SMP) with Microsoft SQL Server 2012 Enterprise at the core and Excel 2010 as the front-end business analytical tool. 4. This content is relevant to audiences including: CIOs. Volume — How Plentiful? Velocity — How Fast? Variety — How Different/Varied? Complexity — How Difficult to manage and/or analyze? . WHAT IS BIG DATA. high-variety or highly complex data challenges.500. IT planners. More data has been created in the last two years than has existed throughout the history of the human race! We have numerous sources for this data: social media posts.5 exabytes (2. CTOs. architects. digital pictures and videos.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets INTRODUCTION Excel 2010 – Powerful BI Tool for Analyzing Big Datasets This paper discusses our approach. 2. REALLY? Defining Big Data In our internet enabled world. findings and recommendations to architecting a highthroughput. Big data can be described and classified using four key aspects: 1. 3. DBAs. and business intelligence (BI) users with an interest in deploying SMP-based DW/BI capabilities that address big-dataset management with reporting and analysis leveraging the power and familiarity of features found in Microsoft Excel 2010.000. We describe our experiences and the design path we took with a relatively stable yet large dataset of about 76 billion rows.

513. number of Aircraft processed — 90. busses) — 9. all videos.652. For time-sensitive processes such as fraud detection or identifying known terrorist at a border crossing. Unstructured data is anything else – plain text.304. Search and index all global web content (all text. legible and actionable information? (The Toronto Stock Exchange. Number of Land vehicles processed (cars. “Who is Who?”. organizations must resolve and relate all relevant sources . How do you keep a country safe and open to travel and trade but closed to crime with the following kinds of stats and data-points: Number of travelers processed — 24. May 2012) Complexity — How Difficult to manage and/or analyze: People relationships and interactions are amongst the most complex and actively morphing information domains to model.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Volume — How Plentiful: Many organizations are simply floundering in the current ocean of ever-growing data with all its many forms and conceivable sizes. big data must often be analyzed and queried as it comes pouring into a business via live transactions and interactions. audio. This data is also driving the creation of even more data — data derived from data! How can an organization turn the data from over 256 million daily financial trades into useable. August 2012) Velocity — How Fast: Sometimes a minute or possibly two is the decision window. all documents. etc.structured data is about traditional relational databases like Microsoft Access or SQL Server.463 . April to June 2012) Variety — How Different/Varied: Big data is any type of data .685… (The Canada Border Services Agency – CBSA. all audio. trucks. video. “Who Knows Who?” and “Who Does What?” In order to answer these questions. PDFs. Microsoft Office Documents. monitor and analyze. and every picture) then serve this mosaic of content up to over 212 million Americans in the course of one month! (Google.

Expected response times for most data queries are within minutes.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets of data (complex. large volumes). we are describing our experience with a relatively stable. leveraging the lowest cost to purchase and to operate. Business Requirements and Use Case Acquisition cost and total cost of ownership was a key driver. and then present that information to the decision maker at the point of decision (rapid delivery). providing a throughput of 2 GB/sec on reads and roughly 1GB/sec on writes. mainstream. high-variety or highly complexity data challenges (as described above). with standard. yet large dataset of about 76 billion rows. Payment services company leverage corporate data to conduct fraud detection—process allows them to deter more than US$37. Performance The solution required us to efficiently process complex queries on large historical datasets. The following table provides three success measures that the client required us to meet: Use Case Run a low volume query Run a medium volume query Run a high volume query Average Number of rows returned 400.5 Million rows 5-10 Million rows Expected Response Time (minutes) In a few minutes 10 minutes 30 minutes General Requirements  The Data Warehouse must be optimized to address report and analysis needs  In-house query tools and other off-the-shelf analytical tools must be able to integrate with the new data warehouse back-end  The data must only be accessible by named and managed departments within the organization. 2011) Are 76 Billion Rows Of Data Big? In this case study. . uncompressed data (representing over 76 billion rows).7 million in fraudulent transactions. (MoneyGram International. with our client requesting a non-proprietary approach (not a specialized solution). data warehouse back-end that could serve close to 40 users and handle over 20 terabytes of raw. Our considerations focus on approaches for relatively highvolume data — and do not specifically address high-velocity. our client needed a cost-effective.000 rows 1. SOLUTION GOALS AND DRIVERS Simply put. industry approaches where possible.

This type of analysis would ideally be performed in an easy-to-use interface. The table below provides an overview of the reference architecture. by industry or by client etc. First. if further investigation is required. Several approaches were considered when architecting the SMP solution used at our client. In creating the best possible performance at a reasonable cost.) TECHNICAL DESIGN STEPS AND PRINCIPLES In the client’s use cases. end-users need the ability to extract millions of rows of data for analysis in other downstream processes. with little or no knowledge of SQL or the underlying data structures.XLS or .CSV formats  Ability to visually categorize and select data (e. .g. data was consumed in two ways. not all approaches would be feasible on currently available hardware. ad-hoc analysis is performed on aggregate-level data to identify trends and patterns. Because of the large data volumes (170 million rows per day). we decided to take a multi-tier approach with regards to the hardware and software.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Ease of Use  Self-service analytics – end-users must be able to quickly conduct their own queries to unlock insights with interactive data exploration and graphing/charting  The solution must provide query tools with an intuitive user interface for creating ad hoc queries and continuing analysis in Microsoft Excel  Ability to export to Excel . Second.

Rolling daily partitions SQL 2012 Analysis Services Tabular In-Memory ~2 TB compressed &D Excel PivotTables AX Que ry s h Ric ation liz a u Vis End-users 2 3 Source Data (Flat Files) ~170 million rows/day 1 SharePoint 2010. but due to the data volume (76 billion rows over 2 years). a 2 terabyte model would still be forced to swap half of the model to disk. Approach 1 – SQL 2012 Tabular in in-Memory Mode The first approach that was considered was to use the new Analysis Services Tabular InMemory model released in SQL 2012. this approach will not work for the vast majority of business users. giving a consistent view across all tools. This development methodology of quickly standing up solutions and evaluating against the clients business requirements allowed us to quickly develop the solution was best suited to the client’s needs. This allows the resulting queries to be pushed down to the SQL Database engine. Power View. the resulting tabular model was estimated to be roughly 2 terabytes in size. ~76 billion rows. which is an excellent tool for visualizing data. PerformancePoint & PowerView Approach 2 – Column Store Index & SQL 2012 tabular in DirectQuery Mode The second approach would be to still use an Analysis Services Tabular model but in DirectQuery mode. . Approach 1: SQL 2012 Tabular in In-Memory Mode y uer XQ MD SSIS ETL Reporting Services L Re ist po rts Ad-hoc Analysis SQL Query MDX Query MD X SQL 2012 Data Warehouse 2 years of data. performance would be acceptable for the data volumes we were considering. all frontend tools (Pivot Tables. making the performance of the system unacceptable. PerformancePoint. Without a powerful analytic frontend tool.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets APPROACHES The following section describes the various approaches considered for the solution. Each approach is described along with its respective pros and cons. and Reporting Services) would consume data from the same source – the tabular model. From an end-user standpoint. This approach would give the best end-user experience. The main drawback to this approach is that DirectQuery models only support the DAX query language. not MDX. When combined with a column store index on the fact tables. but is lacking in analytic functionality when compared to Excel Pivot Tables or traditional OLAP frontend tools. This means that the only supported frontend tool at this time is Power View. Even though fairly robust SMP hardware was being considered with 1 terabyte of RAM.

it is not available with this approach. at first using a ROLAP fact table in order to take advantage of the column-store index already being created to service list reporting through SSRS. Power View would be nice to have as a visualization tool but it was not considered a requirement for this solution. and drill-through to SSRS reports to access detail level data. Since Power View is not supported on multidimensional cubes. PerformancePoint Approach 3 – Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP fact table Once Analysis Services Tabular was ruled-out as a solution for our client. we began considering traditional Analysis Services multidimensional models. Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table uery SQL Q SQL Query SSIS ETL SQL Qu ery DX M y er Qu Reporting Services L Re ist po rts X MD ry Que Excel PivotTables s & PowerPivot Rich ation liz Vis ua Ad-hoc Analysis MDX Query Source Data (Flat Files) ~170 million rows/day 1 2 3 End-users SharePoint 2010. PerformancePoint . An OLAP cube was created based on the star schema in the SQL Data Warehouse. The end-user experience using an OLAP cube was acceptable – users would perform ad-hoc analysis through Excel Pivot tables.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table uery SQL Q SQL Query SSIS ETL SQL Qu ery DX M y er Qu Reporting Services L Re ist po rts X MD ry Que Excel PivotTables s & PowerPivot Rich ation z Vis li ua Ad-hoc Analysis MDX Query Source Data (Flat Files) ~170 million rows/day 1 2 3 End-users SharePoint 2010. For our client.

Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Approach 4 – Column Store Index & SQL 2012 SSAS Multi-dimensional with MOLAP Fact table In order to further improve performance for end-users. Daily Partitions SQL ry Que Excel PivotTables s & PowerPivot Rich ation liz Vis ua MDX Query 2 3 End-users 1 SharePoint 2010. MOLAP Fact Table. The choice of tools also makes a difference. #2 Column Store Index & SQL 2012 tabular in DirectQuery Mode . but by keeping calculations and hierarchies within the relational model we can ensure that all methods of analysis are using the same data and will return the same result (single version of the truth) Approaches #1 SQL 2012 Tabular in inMemory Mode BI Tool Supported All Microsoft tools (PowerPivot. with new features such as Power View requiring a tabular model. Conclusions/Observations Tabular Model was about 2 TB in size. ~76 billion rows. a traditional OLAP cube with MOLAP fact tables was also created. PerformancePoint CONCLUSIONS Designing a system for analysis versus reporting requires different techniques to ensure optimum performance.) can consume this model. SSRS etc. Power View. greatly reducing the overall run-time of the nightly process. Excel does not support DAX queries. Both the end-user experience and performance of the system was deemed acceptable and this was selected as the most desirable approach. Power View supported. Multiple solutions may be required to address the business requirements. In the end the data volumes prohibited the use of this option. Rolling daily partitions SQL 2012 Analysis Services OLAP. Approach 4: Column Store Index & SQL 2012 SSAS Multi-dimensional with MOLAP Fact Table uery SQL Q SQL Query SSIS ETL DX M X MD y er Qu Reporting Services L Re ist po rts Ad-hoc Analysis Source Data (Flat Files) ~170 million rows/day Qu ery SQL 2012 Data Warehouse Column Store Index on Fact 2 years of data. This allows the nightly ETL process to only load the most recent day of data. DirectQuery only supports DAX capable query tools. but this time partitioned by day. We had to abandon this option. PerformancePoint.

#4 REFERENCES Fast Track Data Warehouse on SQL Server Web site http://www. Power View does not consume (MultiDimensional) cubes.Excel 2010 – Powerful BI Tool for Analyzing Big Datasets Approaches #3 Column Store Index & SQL 2012 SSAS Multidimensional with ROLAP fact table Column Store Index & SQL 2012 SSAS Multidimensional with MOLAP Fact table BI Tool Supported SSRS.microsoft. Power View does not consume (MultiDimensional) cubes. Excel.microsoft.com/en-us/sqlserver/ SQL Server DevCenter http://msdn. PerformancePoint fully supported.microsoft.aspx SQL Server Web site http://www.microsoft.microsoft.aspx Fast Track Data Warehouse Reference Guide for SQL Server 2012 http://download.com/sqlserver/en/us/solutions-technologies/data-warehousing/fasttrack. Excel.do cx Choosing a Tabular or Multidimensional Modeling Experience in SQL Server 2012 Analysis Services http://download.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/MicrosoftReportingToolChoices%2020120327%201643E3.com/en-us/sqlserver/ . PerformancePoint fully supported.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.microsoft.com/sqlserver/ How to Choose the Right Reporting and Analysis Tools to Suit Your Style http://download.microsoft. MOLAP cube allowed for a very granular partitioning strategy (by day) while still delivering very good query responses.com/en-us/bi/powerpivot. Power View not supported Conclusions/Observations Uses extra disk space (compared to our MOLAP option).docx SQL Server TechCenter http://technet. Power View not supported SSRS.do cx All about PowerPivot for Microsoft Excel http://www.