You are on page 1of 8

Department of Computing

CS423: Data Warehousing and Data Mining


Class: BESE/BSCS
Lab 02: Understanding ETL through Talend-Transform & Load
Date: 07 October, 2021
Time: 2:00 pm - 4:40 pm

Course Instructor: Dr. Rabia Irfan


Lab Engineer: shakeela

Name: Muhammad Umer Farooq


Class: BESE 9A
Cms Id: 266086

CS423: Data Warehousing and Data Mining Page 1


Lab 3: Understanding ETL through Talend-Transform & Load
Introduction
In computing, extract, transform, load (ETL) is the general procedure of copying data from one
or more sources into a destination system which represents the data differently from the source(s)
or in a different context than the source(s). The ETL process became a popular concept in the
1970s and is often used in data warehousing. ETL systems commonly integrate data from
multiple applications (systems), typically developed and supported by different vendors or
hosted on separate computer hardware. The separate systems containing the original data are
frequently managed and operated by different employees. For example, a cost accounting system
may combine data from payroll, sales, and purchasing. ETL also makes it possible to migrate
data between a variety of sources, destinations, and analysis tools. As a result, the ETL process
plays a critical role in producing business intelligence and executing broader data management
strategies.

Objectives
After performing this lab students should be able to:
1. Perform transformation of data integrated from multiple source
2. Loading the data into a new table

Tools/Software Requirement
Talend
Procedure
It is a continuation of Lab 2 so follow the procedure mentioned in Lab 2.

Task
In continuation of the data integration that you have performed in Lab 02, load the values in new
Agent Rating table with the columns as (Agent, Code, Total Sales, Rating). The transformation
rules for rating are as follows:
 An agent with total sales >5 is Best
 An agent with total sales >=3 and <=5 is Good
 An agent with total sales <3 is Average

CS423: Data Warehousing and Data Mining Page 2


Task Solution:
Table Creation In Postgress:

Query:

CREATE TABLE agents_rating (

_agent_name_ VARCHAR(42) NOT NULL,

_agent_code_ VARCHAR(8) NOT NULL,

_total_sales_ integer NOT NULL,

rating VARCHAR(40)

);

Talend Tmap Updation to include one more output:

CS423: Data Warehousing and Data Mining Page 3


Rating Attribute Expression in Agents_Rating output of Tmap:

row1._Total_Sales_>5?"BEST":row1._Total_Sales_<= 5 && row1._Total_Sales_


>=3 ? "GOOD":"AVERAGE"

Talend Job Update to include third output (Rating_Agents):

CS423: Data Warehousing and Data Mining Page 4


For testing purpose first I am loging it on the console to see the output:

Console Based Output Screenshot:

Updating Talend Job to include tPostgresqlOutput which contain third output (Agents
Rating from Tmap)

CS423: Data Warehousing and Data Mining Page 5


Connecting to postgress database:

Talend Output:

CS423: Data Warehousing and Data Mining Page 6


Checking Table “AgentRating”

Query
select * from agents_rating

CS423: Data Warehousing and Data Mining Page 7


Deliverable
Following are the deliverables of this lab:
1. CSV format of the new Agent Rating table created in Talend
2. Data transformation and loading view in Talend (through different screenshots)1
Submission should be in a single zipped folder (.zip format) at the LMS link provided.

1
Will do the transformation and loading part in Lab 3 so just show the data integration view through
screen shots as a deliverable to this lab.
CS423: Data Warehousing and Data Mining Page 8

You might also like