You are on page 1of 17

ETL

“makes data Intelligent”

by WK Source Health Analytical

TCS Confidential

Data Mart 7. OLTP vs. OLAP 6. 2016 2 . ETL Overview 2. Data Model 11.CONTENTS 1. Data Cleansing 8. Decision Support System 10. Data Warehouse 4. Enterprise DWH 5. ETL Tools and Market May 4. Schemas 12. ETL Process 3. Data Mining 9.

Records that do not satisfy the table or grid constrains.1. There are two approaches to get the data: 1. ETL Overview Extraction: Extracting data from source system and other external sources.  Load data into the staging area after filtering a. Transformation: Transforms and maps the source to target.  Archiving the interface files. Records with invalid data type and b. May 4. Loading: Load the transformed data into appropriated tables and grid.  Load only valid records and extract exception reports. 2016 3 .  Segregates the input stream to multiple output streams. 2. Push approach: Source system pushes the data to the warehouse landing folder in the form of extracts (flat files).  Check valid record types and applies the transformation rules. Pull approach: Programs running in the data warehouse system reach out to the source system and pull data.

2016 4 . ETL Process Extract Transformation Load Vendor Files Meta Data Tables Aggregate Data Granular Data Vendor Data Application Vendor Data Grid OLTP system May 4.2.

 Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries  Data from online transaction processing system.3. cleansed and organized which will enable the users to make business decisions based on facts. other sources are extracted. Data Warehouse  Data warehouse is a model that considerably enhances the ability of the users to analyze data sets. 2016 5 .  Data in the data warehouse must be Subject oriented. integrated. time referenced and non-volatile.  Data warehousing emphasizes capturing of data from diverse sources for useful analysis and access.  It is designed for query and analysis rather than for transaction processing. May 4.

Enterprise Data Warehouse  An enterprise data warehouse (EDW) is a data store designed to produce a single. This supports transforming data into valuable information for managers and others to make better business decisions. TCS Confidential 6 .4. It is Data Warehouse with multiple subjects. comprehensive view of data that an organization accumulates during its course of operations.

 Managing Constraints. 2016 7 .  Used by minimal number of users. OLAP OLTP: Online Transaction Processing OLAP: Online Analytical Processing  Operational System.  Used by Managers for the decision making processes.  Data Warehouse system. OLTP vs.  Large number of users will be using the system.  Data entered by the users responsible for day to day operation.5. Backup and Recovery is complex.  Normalized data.  Managing Constraints. Backup and Recovery is simple. Segments . Segments .  De-normalized Data. May 4.

6. Data Mart 1 Data Mart 2 Data Mart 3 TCS Confidential 8 . Data Warehouse  Subsets of larger data warehouses.  Is a data repository. Data Mart  Contains the snapshot of the operational data.  Focus on a particular subject or department.

redundant.  It is the act of detecting and removing and/or correcting data that is incorrect. Data Cleansing  Also referred to as Data Scrubbing. outof-date. incomplete or formatted incorrectly.  The goal of data cleansing is not just to clean up the data in a database but also to bring consistency to different sets of data that have been merged from separate databases. TCS Confidential 9 .7.

data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.  Data mining software is one among the number of analytical tools for analyzing data. TCS Confidential 1 .  Used by Business intelligence organizations and analysts. Data Mining  Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. and summarize the relationships identified. categorize it.  Technically.8.  It allows users to analyze data from many different dimensions or angles.

Example: A national on-line book seller wants to begin selling its products internationally but first needs to determine if that will be a wise business decision. “DSS applications are not single information resources but the combination of integrated resources working together. DSS applications are systems and subsystems that help people to make decisions based on data that is gathered from a wide range of sources. The vendor can use a DSS to gather information from its own resources to determine if the company has the ability or potential ability to expand its business.” TCS Confidential 1 .9. The DSS will collect and analyze the data and then present it in a way that can be interpreted by humans. typically for business purposes. Decision Support System (DSS) Refers to an interactive computerized system that gathers and presents data from a wide range of sources.

A data model says: a. “One of the most widely used methods for developing data models is the entityrelationship model.” TCS Confidential 1 . What information is to be contained in a database? b.10. And that there is a one-to-many relation between a customer and a product. relationship between data and constraints on data. Data Model Model in which the data is represented and accessed. Product as a product code and price. Its the integrated collection of concepts for describing data. How the information will be used? c. And how the items in the database will be related to each other? Example : A data model might specify that a customer is represented by a customer name and credit card number.

Star Schema Example : Channels Customers SALES Dates TCS Confidential Products 1 . They are two types of Schemas: 1.11. It consists of a few "fact tables" referencing any number of "dimension tables". Schemas Schema is the structure of a database or relational databases. The fact tables hold the main data. Start Schema: The star schema is the simplest style of data warehouse schema. while the usually smaller dimension tables describe each value of a dimension and can be joined to fact tables as needed.

Organization. and month) are being broken out of the dimension tables (Product. the example diagram shown below has 4 dimension tables. Snowflake Schema: A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. The reason is that hierarchies (category. i.e.2. dimension table hierarchies are broken into simpler tables. Location and Time) respectively and shown separately. Snowflake Schema Example : In Snowflake schema. Branch lookup Month lookup Time Organization Sales Location State lookup TCS Confidential Product Product category lookup 1 . state. 4 lookup tables and one fact table. branch.

2016 15 .Star vs. Snow flake Schema :  Dimension table will not have any parent table  Dimension table will have one or more parent tables  Hierarchies for the dimensions are stored in the dimensional table itself in star schema  Whereas hierarchies are broken into separate tables in snow flake schema May 4.

1 Talend 10 DataFlow 6 Group 1 Software (Sagent) 11 Data Integrator 8.2 Cognos 15 DT/Studio 3.0 Pentaho May 4.1 IBM 4 SAS Data Integration Studio 3.1 IBM 18 Pentaho Data Integration 3.0. ETL Tools and Market No ETL Tool Version Vendor 1 Oracle Warehouse Builder (OWB) 11gR1 Oracle 2 Data Integrator (BODI) 11.1 Embarcadero Technologies 16 ETL4ALL 4.2.12 Pervasive 12 Transformation Server 5. 2016 16 .4 DataMirror 13 Transformation Manager 5.12.1 Informatica 6 Oracle Data Integrator 4.5.1 Oracle 7 Data Migrator 7.4 SAS Institute 5 PowerCenter 8.0 Microsoft 9 Talend Open Studio 1.6 Information Builders 8 Integration Services 2005/9.2 IKAN 17 DB2 Warehouse Edition 9.2 ETL Solutions Ltd. 14 Data Manager 8.7 Business Objects 3 IBM Information Server (Ascential) 8.

May 4.Thank You. 2016 17 .