Professional Documents
Culture Documents
1
For Advanced Business
Intelligence Applications
Matt Casters: Chief Architect, Data Integration and
Kettle Project Founder
kettle.pentaho.org
Kettle project homepage
kettle.javaforge.com
Kettle community website: forum, source, documentation, tech tips, samples,
…
www.pentaho.org/download/
All Pentaho modules, pre-configured with sample data
Developer forums, documentation
Ventana Research Open Source BI Survey
www.mysql.com
White paper -
http://dev.mysql.com/tech-resources/articles/mysql_5.0_pentaho.html
Kettle Webinar -
http://www.mysql.com/news-and-events/on-demand-webinars/pentaho-2006-09-1
Pentaho Data Integration MySQL support
Challenges
More data is being gathered all the time
Data is coming from more sources than ever
Faster access to stored information is becoming more important
More people require concurrent access to the data
Advantages
Reduces query time by reducing the amount of data to “plough”
through.
Increases performance by
“Pruning” the list of partitions to search for automatically. This is
done using the MySQL query optimizer based on the query that’s
being issued.
Massive reduction in I/O
Smaller partitioned indexes leading to faster index tree traversal
Allowing parallel access to the different partitions
Table partitioning
A test query aggregating speed & counts per road position per minute
Gave back 642.319 rows
9 seconds for MySQL to get the result
29 seconds to pass the data to the client over JDBC
Table partitioning
2004
Sales
2005
2003
Year 2003 Partition 2006 DB1
2004
2005
Sales
2004
Sales
2005
Year 2005 Partition
2003
2006 DB3
2004
2006 DB4
Database partitioning
Demo time:
Creating partitions partitioned
Loading data partitioned
Reading back data partitioned
Reading back data partitioned and ordered
Table partitioning
A test query aggregating speed & counts per road position per minute
Gave back 642.319 rows
3 seconds for MySQL to get the result
10 seconds to pass the data to the client over JDBC
Demonstrating almost linear scalability!!
Table partitioning
Sales
2003
2004 Server
2005
X
2006 DBx
Pentaho Data Integration : Clustering
Demo-time
Start up 2 slave servers
Run a step across the 2 servers
Monitor
Pentaho Metadata to the rescue
The problem:
Reporting becomes harder on a database partitioned system
In stead of 1 database you read from a bunch of them
Key Projects
JFreeReport Reporting
Kettle Data Integration
Mondrian OLAP
Pentaho BI Platform
Weka Data Mining
Scorecards
Analysis
Aggregates
Reports
Operational
Sales Marketing Inventory Production Financial
Departmental
Pentaho Introduction
Demo and overview
Questions and Closing
?
Birds of a Feather