You are on page 1of 9
3 année Cycle d'ingSieurs ~ GI lob stub pg ll gl ar nll face taleteOt | #260 20160 Business Intelligence / Data Heo nanonate SSIS APM CUES warehise = ee ES TP7: PDI Performing Basic Operations with Databases 1. Connecting to a database Before interacting with a database in PDI, you need to prepare the environment. For demonstration purposes, we will work with MySQL RDBMS engine, but with minor changes, you should be able to reproduce the same tutorials with any other RDBMS. Also, the examples will be based on Sakila database which represents the business processes ofa DVD rental store. ACTION: 11. Install MySQL RDBMS if it has not already installed in your machine. 2. Install MySQL Workbench or any other visual database design tool. 3. Install Sakila sample database. The Sakila sample database is available from https://dev.mysql.com/doc/index-other.html. The archive contains three files: sakila-schema.sql, sakila-data.sql, and sakila.mwb. ‘© The sakila-schema.sql file contains all the CREATE statements required to create the structure of the Sakila database. «+The sakila-data.sql file contains the INSERT statements required to populate the database. «The sakila.mwb file is a MySQL Workbench data model that you can open within MySQL Workbench to examine the database structure. In order to interact with a particular database, you have to define a connection to it. A PDI database connection describes all the parameters needed to connect PDI to a database. Besides, PDI also needs the correspondent JDBC driver. As Pentaho cannot redistribute some third-party database drivers due to licensing restrictions, you may have to execute some manual installation, Y¥. ELYOUNOUSSI ys 2021-2022 olga xu pg bl Ln pal 3#* année Cycle d'ingéieurs GI tock teeT2Ot | HCoBOalt to04COCt Business intelligence / Data scour pans ors Feveces RES te a Warehouse ACTION: 1. Open the lib directory inside the PD! installation folder and verify if a driver—a JAR file—for your database engine is already present. If there is one, skip the following instructions unless you want to replace it with a newer version. 2. Download the MySQL JDBC driver. 3. Copy the JDBC driver to the lib directory. 4. Restart Spoon. ‘Once you have the driver installed, you will be able to create the connection. As an example, the following instructions explain how to connect to the Sakila database installed on a local server: ACTION: Open Spoon and create a new Transformation. In the upper-left corner of the screen, click on the View tab. Right-click on the Database connections option and click on New. Fill in the Database Connection dialog window with the proper settings. The figure 1 shows the settings for the Sakila sample database: pene Figure 1: Configuring Sakla database connection Y. ELYOUNOUSS! 29 2021-2022 lb ab pg. bg al 3¥* année Cycle d'ingéieurs — GI ack tlt | Haut ty ce Business intelligence / Data nana onsen amc Warehouse 5. Click on the Test button to verify whether your database connection is correct or not. Click on OK to close the database definition window. A new database connection named sakila is added to the tree. Right-click on the created database connection and click on Share so that the connection is available in all transformations or jobs you create from now on. The shared connections are shown in bold letters. 8. Save the transformation and close it. 2. Exploring a database with the database Explorer Before beginning to work with the data stored in a database, it would be useful to be familiar with that database or to verify if the database contains all the data that youneed. For this, Spoon offers a Database Explorer. There are several ways to access the explorer ‘Right-click on the connection in the Database connections list and select Explore in the contextual menu. ‘* Goto the menu option, Tools | Database | Explore. In the window that shows up, select the database to explore and click on OK. When you open the Database Explorer, the first thing that you see is a tree with the different objects of the database. When you right-click on a database table, a contextual ‘menu appears with several available options for you to explore that table. acTio 1. Edit the Database Explorer of Sakila database 2. Select any table (e.g. actor table) from Sakila database, and try to execute some ‘exploring options of that table. Y. ELvOUNoUSS! 3/9 2021-2022 la etl pal agg np 3 année Cycle d'ingéieurs ~ GI HOH toleCeOt | HCoOEIC 12O4COC Business Intelligence / Data seouranowisces sme AS Warehouse = out 3. Getting data from a database The Table input step is the main step to get data from a database. To run very simple queries, the use of this step is straightforward. ACTION: 1. Create a new Transformation and save it. From the Input category of steps, select and drag to the work area a Table input step. 3. Double-click on the step. 4, As Connection, select the Sakila connection. 5. Click on the Get SQL select statement... button. The Database Explorer window appears. 6. Expand the tables list, select city, and click on OK. PDI asks you if you want to include the field names in the SQL. Answer Yes. Click on Preview and then OK. The following window appears (figure 2): ex ie Figoe 2: Previewing Table input step Y. ELYOUNOUSS! 4/9 2021-2022 gt Saybia Apa 3 année Cycle dingéieurs - GI ore sakcsot | veoeut tsiC@n Business intelligence / Data wm_=—""" een Warehouse One of the ways you can make your query more flexible is by passing it some parameters. Suppose that you want to list the films that belong to a given category and you have that category in a properties file. If the category in your file is Comedy, the query to run will look as follows: SELECT {title FROM film f JOIN film_category fc ON f-film_id = fc.flm_id JOIN category ca ON fe.category_id = ca:category_id IERE ca.name = ‘Comedy’; In your Transformation, you want to replace ‘Comedy’ with a parameter, so the category changes according to the content of your file. This is done in two parts: ‘+ You have to prepare a stream with the parameter(s) that the query will receive ‘* You adapt the query to receive these parameters ACTION: 1. Create a new Transformation and save it. 2. Add to the work area the following steps: a Property Input, a Filter rows, and a Select values step. Link them as follows: [Be ee aoe 3. Use the preceding steps to read the properties file named categorles.properties, filtering the category key and selecting only the column that contains the category. If you preview the last step, you should see something like this: Figure 4: Previewing the parameters Y, EL YOUNOUSS! 5/9 2021-2022 laa Sutil poll gl anil 3° année Cycle c'ingéieurs —GI tote toeceot | Heaooott 41601 Business Intelligence / Data ee Lennar ont Warehouse Now that you have a stream with the field that will be plugged as a parameter, we are ready to adapt the query: 4. After the Select values step, add a Table input step. 5. Edit the Table input step and type the query introduced previously, but replacing the value of the parameter by a question mark (2) 6. In the Insert data from step option, select the name of the step from which the Parameters will come; in this case, Select values. 7. Close the window. You will see that the hop that links the Select values step with the Table input step changes its look and feel, showing that the Select values step feeds the Table input with data. 8. With the Table input selected, run a preview. Instead of getting the parameters from an incoming step, we can replace the question marks with names of Kettle variables. ACTION: 1. Create a new Transformation and save it. 2. Add a Table input step. 3. Double-click on the Table input step and type the preceding query used, this time, replacing the last line with: WHERE ca.name = '${CATEGORY}'; 4, Check the Replace variables in script? checkbox. 5, Run the Transformation 6. Fill in the Variables tab in the dialog setting window by providing a value for the CATEGORY variable. Kettle variables have several advantages over the use of question marks: ¢ You can use the same variable more than once in the same query. You can use variables for any portion of the query and not just for the values; for ‘example, you could have the following query: SELECT ${COLUMNS} FROM film Then the result will vary upon the content of the ${COLUMNS} variable. Y. ELYOUNOUSS! 6/9 2021-2022 loa ual pg Ll egal pall 3 année Cycle d'ingéieurs ~ GI ia talceOt | COSC 104001 Business Intelligence / Data Hou nanousepessoosers APCS Warehouse ~< pereroun a 4, Inserting, updating and deleting data By now, you know how to get data from a database. Now, you will learn how to insert, update, and delete data from a database. Inserting/updating data into a database table ‘The Table output step is the main PDI step to insert new data into a database table. ‘The use of this step is simple. You have to create a stream of data with the data that you want to insert. At the end of the stream, you will add a Table output step and configure it to perform the operation. ACTION: 1. Using MySQL Workbench tool, create a new database named mydb 2. Create the table “student” as described in the figure bellow (figure 4): ‘The “student_id” column is Auto incremental. Figure 4: “student” table structure 3. Create a new Transformation and save it. 4, Adda Data grid step to the work area. Configure it in order to create a stream that contain the following fields: nom, prenom, date_naissance and adresse. Fillin the Grid with at least 5 rows, 5. Create a new connection to mydb database and share it. 6. Add a Table output step to the work area and link it with the Data grid step. Y. ELYOUNOUSS! 19 2021-2022 HACK toeto0t | 1C208eK 439101 Business inteligence / Data @e oh dl po. a asa 3 anne cycle ingéieurs ~GI 7. fone rsact ents Warehouse Double-click the Table output step to configure it (see the figure 5). Figure 5: Table output configuration window In order to fill the Database fields tab, you have to check the Specify database fields option. 8 Run the transformation and explore “etudiants” database table to check out whether the five rows of the Data grid have been appended. While the Table output step allows youtto insert new data, the Insert /Update step allows do both, insert and update data in a single step. if you only want to perform you to updates, you can use the Update step instead. ACTION: 1. Open the last transformation and save it under a different name. 2. Delete the Table output step and replace it with an Insert/Update step. 3, Do what it takes to add a new student row and update the address of an existing student. 4, Run the transformation and check out the “etudiants” table, Y. ELYOUNOUSS! 8/9 2021-2022 3° année Cycle d’ingéieurs ~G ae eet eoteot Business intelligence / Data Hace talstoot | tesd8atd 109100, ecousnanonasDe Sve Argus Warehouse te b. Deleting data The Delete step allows you to delete records of a database table based on a given condition. ACTION: 1. Create and save a new Transformation. 2. Add Data grid and Delete steps. 3. Do what it takes to delete all the records of “etudiants” table that match the condition: date_naiss_etudiant > 2000-01-01" To delete database table records we can also use the Execute row SQL script step. To use this step, you just create a new string field and use it to define the SQL statement to execute. After that, you add an Execute row SQL script step and the step will execute for every input row. Action 1. Save the previous Transformation with another name. 2. By using the Execute row SQL script step try to delete all the “etudiants” table records that match the condition: prenom_etudiant=‘Ahmed” Y, ELYOuNoUss! 9/3 2021-2022

You might also like