You are on page 1of 15

DATA WAREHOUSING AND DATA MINING

Presented by:Rajan Gupta Amarpreet Kaur Ajay Singh Rathore

Basic terms and its meanings


Data:- data is defined as some meaningful facts and figures.

Database:- storage of data in a paper or electronic files , for future use, analysis, and retrieval is called database.
For example Banking- for customer information accounts, loans and transactions. Universities- for students information, course registration. Sales- customer product and purchase information. Hence, database is where the data resides.

A Producer wants to know.


Which are lowest and highest margin customers?

Who are my customers and what product they are buying?

Main objectives of producer

What is the most effective distribution channel?

What impact new product/ services will have on revenue/margin? These all statement contains meaningful facts and figures but data needs to be transformed from one to other.

What is Data Warehouse?


A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. A process of transforming data into information and making it available to users in a timely enough manner to make a difference. Data warehousing refers to single, centralized, and unified repository of data that works across the enterprise.

A data warehouse is subject-oriented, integrated, timevarying, non-volatile collection of data that is used primarily in organizational decision making.

Process of Data warehousing


Large corporate bodies with their continuing businesses want to retain that data and undertake analysis of data for looking at their profitability trends in complex environment. The data from diverse documents and sources is clubbed into one data model. Standardization involves complying with single system of keeping data. Normalization refers to removing all duplicate data that has crept in because of variety of process Finally, data is consolidated and made available to different users across the organization

The Purpose of Data Warehouse


Purpose of Data Warehousing: Better business intelligence for end users. Reduction in time to access and analyze information. Consolidation of disparate information sources. Replacement of older, less-responsive decision support systems Faster time to market for products and services

Data warehousing pitfalls


There are also disadvantages to using a data warehouse. Some of them are: Over their life, data warehouses can have high costs. Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization. There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems.

The time it take to load the warehouse will also expand to

Data Mining

Data Mining works with Warehouse Data


Data Warehousing provides the Enterprise with a memory

Data Mining provides the Enterprise with intelligence

What is Data mining?


Data mining refers to extracting or mining knowledge from large amount of data. Data mining is also known as knowledge discovery in database.

Data mining software tools find hidden pattern and relationships in large pool of data and infer rules from them that can be used to predict future behavior guide. The major reason why data mining gained a great deal of attraction is due to wide availability of data and imminent need of turning that data into information and knowledge.
The mining of gold from sand or rocks is referred to as gold mining rather than rock or sand mining. Thus data mining should have been named knowledge mining from data.

Data Mining: A KDD Process


Pattern Evaluation

Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data Data Warehouse Selection

Data Cleaning
Data Integration Databases

Problem fomulation

The KDD process

Data collection subset data: sampling might hurt if highly skewed data feature selection Pre-processing: cleaning name/address cleaning, different meanings (annual, yearly), duplicate removal, supplying missing values Transformation: map complex objects e.g. time series data to features e.g. frequency Choosing mining task and mining method: Result evaluation and Visualization:
Knowledge discovery is an iterative process

Application Areas
Industry Finance Insurance Telecommunication Transport Consumer goods Data Service providers Utilities Application Credit Card Analysis Claims, Fraud Analysis Call record analysis Logistics management promotion analysis Value added data Power usage analysis

Purpose of Data Mining


Credit ratings/targeted marketing: Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotions Fraud detection Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? Customer relationship management: Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? :

Data Mining helps extract such information

Conclusion
Data Warehousing provides the means to change the raw data into information for making effective business decisions-the emphasis on information, not data. The Data warehouse is the hub for decision support data Where, Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to speed up data mining process

You might also like