You are on page 1of 39

Pentaho Data Integration 4 and MySQL

Matt Casters: Pentaho's Chief Data Integration Kettle Project Founder

MySQL User Conference, Tuesday April 13th , 2010

Agenda
Pentaho: an introduction Pentaho Data Integration Version 4: New features MySQL support in PDI Q&A

Pentaho Introdu tion


Co!!er ia" open sour e a"ternati#e for $usiness inte""igen e %&I'
(ounded in )**4: Pioneer in o!!er ia" open sour e &I Large referen ea$"e usto!er $ase+ wide range of &I,D- dep"oy!ents

Manage!ent . pro#en &I and open sour e #eterans


&usiness /$0e ts+ Cognos+ 1yperion+ 2&oss+ /ra "e+ 3ed 1at+ SAS+ SugarC3M

&oard of Dire tors 4 deep e5pertise and pro#en su

ess in open sour e

&o$ &earden . 65e uti#e hair!an of the $oard %for!er SpringSour e' Larry Augustin . founder+ VA Software+ he"ped oin the phrase 7open sour e8 9a : ;r"o :er 4 VP of Produ ts+ MySQL,/ra "e &en h!ar: Capita"+ Inde5 Ventures+ New 6nterprise Asso iates

-ide"y re ogni<ed as the "eader in open sour e &I

Pentaho Introdu tion


Co!p"ete &usiness Inte""igen e Suite
6nd.to.end o#erage of a"" &I needs Standards.$ased+ !odu"ar+ standa"one or e!$edda$"e p"atfor!

/pen Sour e Li ensing


Lower software a =uisition osts Lower >ota" Cost of /wnership %>C/'

6nterprise De#e"op!ent Methodo"ogy


>ransparent+ detai"ed road!ap Produ t road!ap and ontri$utions !anaged $y Pentaho Core de#e"opers are Pentaho e!p"oyees 65tensi#e QA

65pert Ser#i es
Co!prehensi#e >raining+ Consu"ting+ 6nterprise ser#i e offerings De"i#ered $y the 65perts

Pentaho Introdu tion 4 6nterprise 6dition

Pentaho Introdu tion 4 Dep"oy!ents


-ide range of dep"oy!ents
3eporting Data Integration , 6>L Dash$oards (u"" &I Suite

>housands of users
?+*** on a sing"e ser#er

Large data #o"u!es


1a"f a tera$yte of "i#e intera ti#e /LAP data 6>L "oading ?**@ rows,se ond

Sophisti ated app"i ations


1undreds of di!ensions

S!a"" dep"oy!ents as we""


)* users+ MS A ess data$ases

Pentaho Introdu tion 4 >e hno"ogy


Co!ponenti<ed and !odu"ar Ser#i e.i!p"e!ented ar hite ture
&ui"t 7fro! the ground up8 as a set of ser#i es 65posed #ia A2AA and -e$ Ser#i es

B**C 2a#a 66 ser#er side


S a"a$"e+ standards.$ased

-e$.$ased+ thin. "ient end user interfa es Draphi a" design interfa es 6!$edded pro ess wor:f"ow engine

Pentaho Introdu tion 4 3eporting


A ess and for!at data fro!
3D&MS+ AML+ /LAP

disparate sour es Produ e in popu"ar for!ats Mu"tip"e report types


/perationa" Ana"yti a" (inan ia" Para!eteri<ed

Do dire t"y against data sour es or PentahoEs entra"i<ed !etadata "ayer

Pentaho Introdu tion 4 Ana"ysis


Na#igate and e5p"ore
Ad ho + intera ti#e ana"ysis Dri"" into further detai" Se"e t spe ifi !e!$ers for ana"ysis

View data 7di!ensiona""y8


iFeF Sa"es $y region+ $y hanne"+ $y ti!e period

3/LAP ar hite ture


-or:s with a"" popu"ar open sour e and proprietary D&s No inter!ediate storage Aggregate ta$"e 7aware8 for faster ana"yti =ueries

Design too"s to $ui"d /LAP s he!as and i!pro#e =uery perfor!an e

Pentaho Introdu tion 4 Dash$oards


Dain #isi$i"ity into your organi<ationEs :ey perfor!an e indi ators %@PIs'
Monitor top."e#e" perfor!an e and dri"" into supporting detai" I""u!inate !etri s for =ui : insight into $usiness a ti#ities >ra : e5 eptions and re ei#e a"erts

Le#erage the fu"" Pentaho &I Suite


Co!prehensi#e auditing of user a ti#ity+ perfor!an e and data a ess Conte5t.sensiti#e dri""ing to reports and ana"ysis #iews Integrated se urity+ s hedu"ing+ a"erting+ porta" integration

Integrate with ?rd .party and usto! app"i ations

Pentaho Introdu tion 4 Dash$oard Designer


-e$.$ased end user dash$oard reation
(ro! Pentaho ;ser Conso"e 79ero training8

>e!p"ate and the!e.$ased reation In orporate reports+ ana"ysis #iews+ Ado$e ("ash.$ased harts and other Pentaho ontent Create new harts and intera ti#e data grids fro! s rat h
Pentaho !etadata 4 no SQL re=uired

(i"ter ontro"s

Pentaho Introdu tion 4 &I P"atfor!


Pro#ides riti a" ser#i es for end users
6asy a ess to $usiness infor!ation Intuiti#e s hedu"ing De"i#ery o#er the we$ or #ia e!ai" A"erting and notifi ation

Pentaho User Console

Pro#ides riti a" ser#i es for ad!inistrators


Centra"i<ed thin. "ient ad!inistration Data sour e and se urity !anage!ent Auditing and Perfor!an e !onitoring 6nterprise se urity integration Definition and e5e ution of $usiness ru"es Integration points with ?rd party app"i ations

Pentaho Enterprise Console

Pentaho Introdu tion 4 Metadata


Pro#ides an a$stra tion "ayer $etween sour e syste!s and $usiness user on epts Draphi a" design en#iron!ent for defining !etadata !ode" Data presented to $usiness users in $usiness ter!s A""ows $usiness users to reate their own ad ho reports $ased on entra"i<ed $usiness ru"es+ without any te hni a" s:i""s or :now"edge of SQL Changes to physi a" data$ase do not i!pa t reports or ana"yti #iews
Physical Database Model Automated SQL generation Business I ntelligence Metadata Business User

Pentaho Introdu tion 4 Data Mining


>a:e &I to the ne5t "e#e" with predi ti#e ana"yti s Dain insight into hidden patterns and re"ationships Dis o#er indi ators of future perfor!an e 65p"oit orre"ations to i!pro#e organi<ationa" perfor!an e 6!$ed re o!!endations in reports+ dash$oards+ or usto! app"i ations

Agenda
Pentaho: an introdu tion Pentaho Data Integration Version 4: New features MySQL support in PDI Q&A

Pentaho Data Integration for &I

&usiness Inte""igen eG >hatHs what we doF

Pentaho Data Integration 4 @ett"e

Kettle Extraction Transportation Transformation Loading Environment

Pentaho Data Integration 4 65tra tion


65tra t data fro! :
?IJ data$ase types MySQL+ PostgreSQL+ SQLite+ FFF /ra "e+ SQL Ser#er+ et >e5t fi"es AML fi"es ALS fi"es A$ase fi"es %d&ase+ (o5pro+ et ' (i"e syste!s infor!ation Denerated data MS A ess fi"es LDAP Deo.data FFF

Pentaho Data Integration 4 >ransportation


>ransportation of data
6ngine $ased data transfer %no ode generator' Very f"e5i$"e pathways: sp"itting partitioning !erging 0oining dup"i ating "ustering %MPP'

Pentaho Data Integration 4 >ransfor!ation


("e5i$"y transfor! data
Loo:ing up data data$ases fi"es !e!oryFFF Ca" u"ating S ripting 2a#aS ript+ SQL+ 3eg65p Sp"itting Mapping Se"e ting (i"tering Pi#otting FFF

Pentaho Data Integration 4 Loading


Load data into a target for!at
Data$ase "oads Data warehouse popu"ation Partitioned "oading &u": "oading Para""e" "oading C"ustering

Pentaho Data Integration 4 6n#iron!ent


(u"" D;I a""ed 7Spoon8 to edit e#ery option in @ett"e
Drag & Drop De$ugger 3i h D;I

Co!!and "ine too"s


e5e ute 0o$s e5e ute transfor!ations

-e$ ser#er
"ustering re!ote e5e ution

Progra!!ing API for 2a#a P"ugin e o.syste! FFF

Pentaho Data Integration 4 Co!!unity


Paying Pentaho usto!ers Large and s!a"" orporations
A"" possi$"e se tors
http://www.ohloh.net/projects/362 !p"#ettle

Lone rangers & 1o$$iests A"" regions on 6arth ;se our 2I3A ase tra :ing syste!s Down"oad !ore than B*+*** opies of @ett"e per !onth
http://www.softpedia.com/progClean/#ettle$Clean$%&&' .html

Meet on our (oru! : J?*+*** posts in ? years

Pentaho Data Integration 4 use. ases


Load data fro! te5t fi"es and store it into a data$ase [demo] 65port data fro! data$ase to te5t.fi"e or !ore other data$ases Data !igration $etween data$ase app"i ations 65p"oration of data in e5isting data$ases %ta$"es+ #iews+ et F' Infor!ation i!pro#e!ent using "oo:ups Data "eaning App"i ation integration Data warehouse popu"ation App"i ation integration 3eport data generation FFF

Pentaho Data Integration 4 Adoption


-ide range of produ tion dep"oy!ents
S!a"" and !ediu!.si<ed o!panies Large enterprises

3apid produ t e#o"ution


Dri#en $y Pentaho in#est!ent In "udes signifi ant o!!unity ontri$utions 7Contri$ution.friend"y8 ar hite ture Natura" fit for additiona" data sour es+ targets and transfor!ations

Pentaho Data Integration 4 Adoption


Most deployed open source data integration solution. Independent study by Mark Madsen

of Third Nature and the BeyeNETWORK o!nload free study at pentaho.co"

Pentaho Data Integration 4 Lin:s


1o!epage: http:,,:ett"eFpentahoForg (oru!: http:,,foru!sFpentahoForg,foru!disp"ayFphpKfLMN Case tra :er: http:,,0iraFpentahoForg,$rowse,PDI Continuous Integration Ser#er: http:,, iFpentahoF o!,0o$,@ett"e -i:i : http:,,wi:iFpentahoForg,disp"ay,6AI I3C Channe": OOpentaho %on (reenode' Mai"ing "ist: http:,,groupsFgoog"eF o!,group,:ett"e.de#e"opers My $"og: http:,,wwwFi$ridgeF$e My oordinates: ! asters at pentaho dot org

Agenda
Pentaho: an introdu tion Pentaho Data Integration Version 4 : New features MySQL support in PDI Q&A

Version 4: New features . Visua"isation


Demo New we" o!e s reen Mouse.o#er s"ide.outs for i ons 1op reation I!pro#ed error hand"ing onfiguration New perspe ti#es support for Agi"e &I #isua"isations+ !ode""ing+ s hedu"ing+ et F

Version 4: New features . 3unning 0o$s


Dri"" down into running 0o$ entries Visua" indi ators of running and o!p"eted 0o$ entries Su ess and fai"ure !ini.i ons

Mouse o#er o!p"etion !ini.i ons shows detai"s of e5e ution resu"ts Log apturing of o!p"eted 0o$ entries

Version 4: New features . 3unning transfor!ations


Dri"" down into running transfor!ation 0o$ entries and !appings 3ow input,output sniff testing: see what rows are passing %de!o' 3e!ote input,output sniff testing on a Carte ser#er

Version 4: New features . &etter "ogging


3edu ed !e!ory onsu!ption In re!enta" "og updates D"o$a" "og $uffer si<e "i!it for "ong running 0o$s,transfor!ations Inter#a" "ogging Auto "ean.up of o"d "og re ords Log re ord ti!e.outs & e5e ution "ineage Log re ord o"our oding in Spoon %$"ue and red for error "ines' Step and 0o$ entry "e#e" Logging 65e ution "ineage "ogging 3ena!ing indi#idua" o"u!ns D"o$a" onfiguration options for a"" "og ta$"es

Version 4: New features . P"ugins


;nified p"ug.in ar hite ture 6asier dep"oy!ent and pa :aging Step+ 0o$ entry+ partitioner+ data$ase type+ spoon perspe ti#e+ "ife. y "e+ FFF : a"" p"ugga$"e ..P MySQL IFB p"ugin

Version 4: New features . 3epositories


A""owing for ?rd party repositories "i:e the Pentaho ;nified 6nterprise 3epository 3e!o#ed dependen ies to re"ationa" data$ase repository %sti"" supported though' Added support for repositories apa$"e of tea!.de#e"op!ent %fi"e "o :ing' Added support for repositories apa$"e of fine.grained se urity repositories Added support for repositories apa$"e of storing and retrie#ing re#ision history

Version 4: New features 4 New steps


SAP Input Data Drid /LAP Input %Mondrian+ Pa"o+ SSAS+ SAP &,-' Pa"o Ce"" Input,/utput+ Di!ension Input,/utput Sa"esfor e De"ete+ Insert+ ;pdate+ ;psert Add fie"ds hanging se=uen e %group se=uen e' ;ser Defined 2a#a C"ass: reate your own p"ugin in 2a#a on the f"y in a step Send infor!ation using Sys"og: Send a !essage to a Sys"og ser#erF 2a#a (i"ter Me!ory Droup &y (arrage strea!ing $u": "oader >eradata (ast"oad &u": "oader 65peri!enta" steps "i:e Det ta$"e na!es+ 6!ai" !essages input+ FFF

Agenda
Pentaho: an introdu tion Pentaho Data Integration Version 4 : New features MySQL support in PDI Q&A

MySQL Support in PDI


2D&C,/D&C Dri#er Integration 3eading: MySQL 3esu"t Strea!ing % ursor e!u"ation' support -riting: MySQL dia"e ts for data types 2o$ entry: &u": Loader of te5t fi"es for MySQL 2o$ entry: &u": writer to a te5t fi"e for MySQL

Data$ase Partitioning %Sharding' Demo

Data$ase partitioning
Sa"es ta$"e Sa"es
)**? Qear )**? Partition )**4 )**I Qear )**4 Partition )**M

Sa"es
)**? )**4 )**I )**M

DB

Sa"es

DB!

)**? )**4 )**I

Sa"es
Qear )**I Partition )**? )**4 Qear )**M Partition )**I )**M

)**M

DB

DB4

Questions and C"osing


/ther Pentaho re"ated ;ser Conferen e infor!ation: Co""apsing &I fro! Months to Minutes %Agi"e &I'
2ared Corne"ius &a""roo! 1 BB:IIa! >uesday Apri" B?th

MySQL &inary Log Ana"ysis -ith Pentaho &I


3o$ert &ooth &a""roo! & I:BIp! -ednesday Apri" B4th

>he Pentaho &ooth IBM in the 65i$ition 1a""

ETA: September 2010

You might also like