Data Mining with Excel 2010 and PowerPivot

Mark Tabladillo Ph.D. MTabladillo <(at)> solidq.com September 18, 2010

SQL Saturday 46 -- Raleigh NC #sqlsat46

© 2010 Mark Tabladillo Ph.D.

2

MarkTab & Data Mining

© 2010 Mark Tabladillo Ph.D.

3

© 2010 Mark Tabladillo Ph.D.

4

© 2010 Mark Tabladillo Ph.D.

5

Outline

What is Data Mining

What is PowerPivot

Demos

© 2010 Mark Tabladillo Ph.D.

6

Data Mining as a Service

© 2010 Mark Tabladillo Ph.D.

7

Outline

What is Data Mining

What is PowerPivot

Demos

© 2010 Mark Tabladillo Ph.D.

8

Data Mining Definitions
• Data mining • Machine Learning • Data mining algorithms -- typically use estimation or optimization to achieve results (as opposed to only calculations).

© 2010 Mark Tabladillo Ph.D.

9

Data Mining Tasks
• Supervised
• Answer known, what is correlated?

• Unsupervised
• Answer unknown (unspecified), what are the groups? • Given a trend, what is next?
© 2010 Mark Tabladillo Ph.D.

• Forecasting

Value Slide

10

Data Mining Add-In for Excel
Requires Analysis Services instance Version 10.00.2531.00 (April 2009) 32-Bit Add-In Microsoft .NET Framework 2.0 (32-bit) Office 2007 (Professional, Professional Plus, Ultimate, Enterprise) • SQL Server Enterprise or Standard (or Developer) 2008 or higher • • • • •

11

© 2010 Mark Tabladillo Ph.D.

The Analyze Tab

12

© 2010 Mark Tabladillo Ph.D.

The Analyze Tab

Menu Option Analyze Key Influencers Detect Categories Fill from Example Forecast Highlight Exceptions Scenario Analysis (Goal Seek) Scenario Analysis (What If) Prediction Calculator Shopping Basket Analysis

Data Mining Algorithm Naïve Bayes Clustering Logistic Regression Time Series Clustering Logistic Regression Logistic Regression Logistic Regression Association Rules
© 2010 Mark Tabladillo Ph.D.

13

Data Mining Tab

14

© 2010 Mark Tabladillo Ph.D.

Data Mining Tab

Many
15
© 2010 Mark Tabladillo Ph.D.

Data Mining Capacities
SQL Server 2008 R2 Analysis Services Object Maximum data mining models per structure Maximum data mining structures per solution Maximum data mining structures per Analysis Services database Maximum data mining attributes (variables) per structure Maximum sizes/numbers 2^31-1 = 2,147,483,647 2^31-1 = 2,147,483,647 2^31-1 = 2,147,483,647 2^31-1 = 2,147,483,647
© 2010 Mark Tabladillo Ph.D.

Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-serverdata-mining-capacities-2008-r2/

16

Data Mining Tab

17

© 2010 Mark Tabladillo Ph.D.

Outline

What is Data Mining

What is PowerPivot

Demos

18

© 2010 Mark Tabladillo Ph.D.

PowerPivot for Excel
• Take advantage of familiar Excel tools and features • Process massive amounts of data in seconds • Load even the largest data sets from virtually any source • Use powerful new analytical capabilities, such as Data Analysis Expressions (DAX) • Make the most of multi-core processors and gigabytes of memory

19

© 2010 Mark Tabladillo Ph.D.

PowerPivot for Excel Sources
• SQL Server • SQL Azure • Oracle, Teradata, Sybase, Informix, IBM DB2 • OLEDB/ODBC • Analysis Services (SSAS) • Reporting Services (SSRS) • Excel, Text File

20

© 2010 Mark Tabladillo Ph.D.

PowerPivot Reference
• http://www.powerpivot.com (Product Site) • http://www.powerpivotpro.com (Blog Site)
© 2010 Mark Tabladillo Ph.D.

21

Outline

What is Data Mining

What is PowerPivot

Demos

22

© 2010 Mark Tabladillo Ph.D.

Resources
• MarkTab.NET Blog, links, video resources and information for data mining • Blog: http://marktab.net/datamining • Twitter: @MarkTabNet

23

© 2010 Mark Tabladillo Ph.D.

24

© 2010 Mark Tabladillo Ph.D.

Regroup and Conclusion
• Main Points from this Presentation

25

© 2010 Mark Tabladillo Ph.D.

Contact Information
• Mark Tabladillo mtabladillo <{at}> solidq.com • Also on: Twitter Linked In

26

© 2010 Mark Tabladillo Ph.D.

Sign up to vote on this title
UsefulNot useful