You are on page 1of 6

Date warehouse model

Model is nothing but way to represent the data in data warehouse.

There are three models which we can represent data.


 Enterprise DWH
 Data mart
 Virtual Warehouse model

Enterprise DWH: -

 This type of model collects and represents information of an entireorganization.


 It is mainly focused on entire organization rather than subject.

 The data is integrated from different operational system and is stored into data warehouse.

Attributes:

 It has a single vision of truth


 Implemented as mission critical environments means it can handle any type of situation.
 It is scalable.

Data mart: -

Data mart is a subset or part of a data warehouse which is mainly focused on single subject
history.

By using this we cannot receive complete details for all subject areas.

Virtual Warehouse model: -


Virtual is nothing but it does not really exist but it has a virtual view of data base.In this model
we can use access points through that access data.

Why we need?

Ans: If we are not having data warehouse but we want to access the data from multiplesources then
users can create virtual data bases.
When we use virtual warehouses, the data accessing is very fast and it can apply abstraction also.

Basic tasks of data mining

Task is nothing but action to be performed by data mining tool.


There are two types of tasks they are
Predictive task
Descriptive Task

Predictive task:

Here users are going to predict values of the target attribute by using different set ofknown values.
We are going to predict the value it is not hundred percent true or hundred percent false.

Again, these tasks can be classified into three types

 Classification

 Regression

 Time series analysis

1. Classification:
 It is having 3 set of classification by using this we can predict target value.
 It finds discrete or finite or fixed target variable.
 Whether a person buying book or not.

C1 C2 Target attribute
- - True
- - False

In the above representation we are going to predict target attribute values based on independent
attribute values.

2. Regression:
 It is used to predict continuous target variable for that reason we could not able to predict
future target variable.
 It is also used in mathematics formulas. Ex: Book price.
Book price is varying day by day weekly etc.

3. Time series analysis:


 It is also predictive model by using this we can predict the target value based on time.
 Ex: Heart beat rate.

Descriptive task:

These types of tasks enable us to determine patterns and relationship in data.


There are four types of tasks they are:

 Cluster analysis

 Associative rule analysis

 Summarization

 Anamoly detection

1. Cluster analysis:
 It is also called as grouping or segmentation.
 In this model users are going to make a group with similar type of attributes.
 Ex: In a class we are going to group the students based on the marks attribute.
 Whenever clustering is done then we can find the pattern and relationship easily.
2. Associative rule analysis:

 In this model users are going to discover a pattern based on stronglyassociates features of
attributes.
 It means users should know the relationship between the attributes.
 Ex: Retail services. In this first we should know relationship between the producer and
consumer.
3. Anomaly detection:
This is a task to identify problems or anomalies in the data which helps us to check the correctness.
4. Summarization:
 It is a short conclusion of large data due to this user can determine the patterns very
easily.

Data mining verses knowledge discovery in database

The goal of data mining and KDD is same that is “process of discovering knowledgefrom large
amount of data”.
But the difference is KDD is to extract knowledge from large databases with the help of data mining
methods.
It means KDD process which is having many phases. Out of all that datamining is a one of the critical
steps.

Phases of KDD: There are six steps behind the KDD process.

1.Data integration.
2.Data selection.
3.Data transformation.
4.Data mining.
5.Pattern evaluation.
6.Knowledge representation.
1.Data integration:
In this phase collection of data from different sources and integrated into a single source
(DWH).

2. Data selection:

In this phase retrieve purpose first select relevant data from DWH.
3. Data transformation:
After selecting the data, that will be transformed in other forms as per requirements.

Data cleaning: It involves removal of noisy and irrelavent data from the database.

4. Data mining:

Apply various techniques like association rule, classification, clustering, regression etc., to
extract the data patterns.
5. Pattern evaluation:
The different data patterns generated by data mining are evaluated using metrics.
6. Knowledge representation:
The final step of KDD, which represents the knowledge extracted in the user required forms.

Issues of datamining

Data mining systems face a lot of challenges / issues in todays world, some of them are:

1.Mining methodology:
User should know what kind of methodology used to retrieve data.
2. Issues related to handling different types of data:
Data is not in one format if may be images, documents, XML, jpg etc.
When we get different types of data there may be chance of getting issues in that.
3. Performance:
Once we define with the data and type, then next major target is performance ofoperation.
Generally performance is measured based on efficiency, effectiveness and scalability.
4. Incorporation of background knowledge:
If users do not have a domain (subject) knowledge then they cant find workflow and solution
that’s why first we should know background knowledge of a particular domain.
5. Pattern evaluation:
 If any user wants to retrieve pattern first they should know the relationship between the
attributes.
 They can easy to retrieve pattern.
6. Handling noisy and incomplete data:

 When we are getting data from different source they may changes to get noisy / distributed /
corrupted and unfilled data.

 It is also one of the issues.


7. Parallel, distributed and incremental mining methods:
 When the data is distributed from different sources and data parallelly updated at the same
time, then they can use incremental mining methods.
8. Integration of the discovered knowledge with existing one:
 It is a knowledge fusion means first we should know existing application details.

Data mining metrics

Metrics are the set of measurements, which can help in determining the efficiency of a data mining
methods / Algorithm.
It helps us to decide / choose the right datamining algorithms.
Each datamining method will have its own metrics.
For example, for web mining, the various metrics are website, visitors, pages served, queries etc.

You might also like