You are on page 1of 10

Volume 1, No.

10, December 2012 ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/

Data Warehousing Concept Using ETL Process For SCD Type-1


1

K.Srikanth1, N.V.E.S.Murthy 2, J.Anitha 3

Andhra University, M.Tech (Ph.D), Visakhapatnam, India. 2 Andhra University, Professor, Visakhapatnam, India. 3 Andhra University, M.Tech (Ph.D), Visakhapatnam, India.
2

srikanthwizard@gmail.com dr nvesmurthy@rediffmail.com 3 anithanv28@gmail.com

Abstract: A Type 1 change overwrites an existing dimensional attribute with new information. In the customer name-change example, the new name overwrites the old name, and the value for the old version is lost. A Type One change updates only the attribute, doesn't insert new records, and affects no keys. The new incoming record (changed/modified data set) replaces the existing old record in target. It is easy to implement but does not maintain any history of prior attribute values. Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule. Keywords- ETL; Metadata; Mapping; Transformation. I. INTRODUCTION With Slowly Changing Dimensions (SCDs) data changes slowly[1], rather than changing on a time-based, regular schedule. For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension? You could calculate the sum or average of each salespersons sales, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson was transferred and used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, his/her totals will look much stronger than the other salespeople in their new region. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems. Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason[3]; These
2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012 are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time. The new incoming record (changed/modified data set) replaces the existing old record in target.Using the oracle emp table source data implemented on SCD type-1, how to modify and how to store the date in emp table Table 1. A. Implementation: Source:

Table 1: Oracle SQL Query On EMP Table II.SOURCE TABLE AN SOURCE ANALYZER Add a relational Table source definition to a mapping, U need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the informatica server reads when it runs a session Figure 1.

Figure 1: Source Table an Source Analyzer

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

10

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012 III. TARGET TABLE AN TARGET DESIGNER Target definitions define the structure of tables in the target database, or the structure of file targets the Power Center Server creates when you run a workflow. If you add a target definition to the repository that does not exist in a relational database, you need to create target tables in your target database Figure 2. You do this by generating and executing the necessary SQL code within the Warehouse Designer.

Figure 2: Target Table an Target Designer IV. EXPRESSION TRANSFORMATION IN INFORMATICA

Expression transformation is a connected, passive transformation used to calculate values on a single row[5]. Examples of calculations are concatenating the first and last name, adjusting the employee salaries, converting strings to date etc. Expression transformation can also be used to test conditional statements before passing the data to other transformations. A. Creating an Expression Transformation: Just follow the below steps to create an expression transformation 1. In the mapping designer, create a new mapping or open an existing mapping. 2. Go to Toolbar->click Transformation -> Create. Select the expression transformation. Figure 3. 3. Enter a name, click on Create and then click on Done.

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

11

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012

Figure 3: Diagram for Expression Transformation

Figure 4: Creating Expression port logic You can add ports to expression transformation either by selecting and dragging ports from other transformations or by opening the expression transformation and create ports manually Figure 4.We can add the port inset_flag using string datatype. In expression transformation implement the employee key either true or false. IIF(ISNULL(EMPKEY,TRUE,FALSE) V. ROUTER TRANSFORMATION IN INFORMATICA Router transformation is an active and connected transformation[8]. It is similar to the filter transformation used to test a condition and filter the data. In a filter transformation, you can specify only one condition and drops the rows that do not satisfy the condition Figure 5. Where as in a router transformation, you can specify more than one condition and provides the ability for route the data that meet the test condition[6]. Use router transformation if you need to test the same input data on multiple conditions.
2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

12

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012 A. Creating Router Transformation Follow the below steps to create a router transformation 1. In the mapping designer, create a new mapping or open an existing mapping 2. Go the toolbar->Click on Transformation->Create 3. Select the Router Transformation, enter the name, click on create and then click on Done. 4. Select the ports from the upstream transformation and drag them to the router transformation. You can also create input ports manually on the ports tab.

Figure 5: Creating Router Transformation We can implement the Router transformation split the two new Groups ports. One group name Insert second group name update. Insert: Insert_flag=True Update:Insert_flag=false VI. UPDATE STRATEGY TRANSFORMATION IN INFORMATICA Update strategy transformation is an active and connected transformation. Update strategy transformation is used to insert, update, and delete records in the target table. It can also reject the records without reaching the target table[7]. When you design a target table, you need to decide what data should be stored in the target. When you want to maintain a history or source in the target table, then for every change in the source record you want to insert a new record in the target table. When you want an exact copy of source data to be maintained in the target table, then if the source data changes you have to update the corresponding records in the target[2]. The design of the target table decides how to handle the changes to existing rows Figure 6. In the informatica, you can set the update strategy at two different levels:

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

13

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012

Session Level: Configuring at session level instructs the integration service to either treat all rows in the same way (Insert or update or delete) or use instructions coded in the session mapping to flag for different database operations. Mapping Level: Use update strategy transformation to flag rows for inert, update, delete or reject. Flagging Rows in Mapping with Update Strategy:

A.

You have to flag each row for inserting, updating, deleting or rejecting. The constants and their numeric equivalents for each database operation are listed below. DD_INSERT: Numeric value is 0. Used for flagging the row as Insert. DD_UPDATE: Numeric value is 1. Used for flagging the row as Update. DD_DELETE: Numeric value is 2. Used for flagging the row as Delete. DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

Figure 6: Update Strategy Transformation In this Update Strategy Transformation using only Insert and Update. Transformation Attribute Update Strategy Expression: Update Strategy Expression: Value 0 1

VII.SEQUENCE GENERATOR TRANSFORMATION


Passive and Connected Transformation. The Sequence Generator transformation generates numeric values. Use the Sequence Generator to create unique primary key values[5], replace missing primary keys, or cycle through a sequential range of numbers.
2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

14

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012 We use it to generate Surrogate Key in DWH environment mostly. When we want to Maintain history, then we need a key other than Primary Key to uniquely identify the record. So we create a Sequence 1,2,3,4 and so on Figure 7. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and cant maintain history[10]. So we use Surrogate key as Primary key and not EMPNO. A. Sequence Generator Ports : The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVA.

We cannot edit or delete these ports. Likewise, we cannot add ports to the transformation.

NEXTVAL: Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.

Figure 7: Sequence Generator Transformation VIII. SCD TYPE-1 MAPPING DESIGN The complete Slowly Changing Dimension Mapping Design flow, Figure 8. This flow will provide completion information of SCD-Type-1 source data how to load target, maintain the data processing.

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

15

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012

Figure 8: Slowly Changing Dimensions (SCDs) Flow

A. Insert : Insert into new employee records and Update the data complete information in this tableTable Same data will display the graphical mode in ETL processing,after inert and update data available in Table 3.

Table 2: New record inserted table

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

16

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012

Table 3 : Display the Designer Preview Data Result: Display the preview data using Slowly Changing Dimensions (SCDs) Type-1 only The new incoming record (changed/modified data set) replaces the existing old record in target. Source Data: Table 1 Target Data : Table 2 Table 3[Graphical view]

IX.

CONCLUSIONS AND FUTURE WORK

Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources .In this paper, we have focused on the problem A Type One change updates only the attribute, doesn't insert new records, and affects no keys. It is easy to implement but does not maintain any history of prior attribute values. Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule. Under the framework of conventional ETL, the ETL process is defined[7] for different data source, develop and compile program or script, retrieval records from database.In this paper, a useful engineering made study for ETL tool selection was developed. In the end. all three initial objec-tives were achieved[9]. Comprehensive ETL criteria were identified. testing procedures were developed. and this work was applied to commercial ETL tools. The study covered all major aspects of ETL usage and can be used to effectivel! compare and evaluate various ETL tools. REFERENCES [1] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215-229. [2] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. In Source IT
2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

17

K.Srikanth1, N.V.E.S.Murthy2, J.Anitha3, The International Journal of Computer Science & Applications (TIJCSA) ISSN 2278-1080, Vol. 1 No. 10 December 2012 Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf. [3] T. Jun, C. Kai, Feng Yu, T. Gang, The Research and Application of ETL Tools in Business Intelligence Project, in Proc. International Forum on Information Technology and Applications, 2009, IEEE, pp.620-623. [4] Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning,Conforming, and Delivering Data. John Wiley & Sons,2004. [5] Labio, W., Garcia-Molina, H.: Ecient Snapshot Dierential Algorithms for Data Warehousing. VLDB,1996. [6] Informatica Power Center, Available at: www.informatica.com/ products/ data integration/ power center/ default.htm . [7] Teradata, Available at: www.teradata.com. [8] Sun SPACE M9000 Processor, Available at: http://www.sun.com/servers/highend/m9000/ [9] L. Troy, C. Pydimukkala, How to Use Power Center with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at: www.myinformatica.com. [10] Widom, J.: Research Problems in Data Warehousing. CIKM, 1995.

2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved

18

You might also like