For dimensional analysis Data mining Roadmap to Data Warehousing Data extracted, transformed and cleaned Stored in a database - RDBMS, MDD Query and Reporting systems Executive Information System and Decision Support System Data Warehousing Process Overview The major components of a data warehousing process Data sources: internal, external (data provider), OLAP, ERP, Web data Data extraction: using custom-written or commercial software called (ETL) Data loading: loaded into a staging area to be transformed and cleansed, then loaded into the warehouse Comprehensive database: It is the EDW to support all decision analysis Metadata: to ease indexing and search Middleware tools: to enable access to DW. It includes data mining tools, OLAP, reporting tools, and data visualization tools. Data Integration and the Extraction, Transformation, and Load (ETL) Process Extraction, transformation, and load (ETL) A data warehousing process that consists of: extraction (i.e., reading data from a database), transformation (i.e., converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database), and load (i.e., putting the data into the data warehouse) During extraction process, the input files are written to a set of staging tables, to facilitate the load process. Sample ETL Tools Power Mart/Power Center from Informatica Teradata Warehouse Builder from Teradata DataStage from Ascential Software SAS System from SAS Institute Sagent Solution from Sagent Software Hummingbird Genio Suite from Hummingbird Communications Talend open studio (Open source) ETL Process Flow ETL - > Big data Bottom line: Any data that exceeds our current capability of processing can be regarded as “big” Big Data : high-volume, high-velocity, and/or high-variety. Walmart handles more than 1 million(10 Lakhs) customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes(1 PB=1000000 GB) of data Facebook handles 40 billion(1 billion=100 crores) photos from its user base. Challenges in traditional approach: Recent survey says that 80% of the data created in the world are unstructured. Challenge is how these unstructured data can be structured, before we attempt to understand and capture the most important data. Another challenge is how we can store it. For Sentimental analysis – Facebook,Twitter Top tools used to store and analyze Big Data. Apache Hadoop Hive Sqoop Flumes Presto Spark Kafka Nifi and many more.. To Conclude: Data = Understanding. “The goal is to turn data into information, and information into insight.” Data decides the business and decision making. Choosing career path in data handling will be a worthful decision for the future. Data modelling- Points to remember What are the dimension and fact tables? (How many) What schema used? Star or Snow flake? Primary key and foreign keys used Types of indexes used – Bitmap, Functional etc.. Performance tuning Bottle necks A word to your heart: Everything You’ve Ever Wanted Is On The Other Side Of Fear.