You are on page 1of 18

Data Mining

DW Architecture

Source: Sharda et.al (2018)


Issues to consider when deciding which
architecture to use:
► Which database management system (DBMS) should be used?
Selection of vendors
► Will parallel processing and/or partitioning be used?
Improves scalability
► Will data migration tools be used to load the data warehouse?

► What tools will be used to support data retrieval and analysis?


Selecting in-house tools or third-party
Alternative DW architecture

Supported by Ralph
Kimball

Source: Sharda et.al (2018)


Supported by Bill Inmon

Source: Sharda et.al (2018)


Source: Sharda et.al (2018)
Ten Factors that Potentially Affect the
Architecture Selection Decision
Information
interdependence Upper Urgency of need
Nature of end-user
between management’s for a data
tasks
organizational information needs warehouse
units

Strategic view of
Perceived ability
Constraints on the data warehouse Compatibility with
of the in-house IT
resources prior to existing systems
staff
implementation

Social/political
Technical issues
factors

Source: Ariyachandra and Watson (2005), Sharda e


(2018)
DW development

kimball model Inmon model

Source: Sharda et.al (2018)


Additional factors to be considered for
development of DW

Requires minimal
Frees up capacity on Makes powerful
investment in Frees up cash flow
in-house systems solutions affordable
infrastructure

Enables solutions Offers better quality


Provides faster
that provide for equipment and
connections
growth software

Source: Sharda et.al (2018)


Data representation in DW

Source: Sharda et.al (2018)


Data Mining

The nontrivial process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data stored in structured databases. - Fayyad et al., (1996)

Source: Sharda et.al (2018)


Data Mining Characteristics and objectives

► Data warehouse is the primary source of data mining but other data sources can
also be used.
► The architecture for data mining is that of a client-server or web based
architecture.
► The nature of data can be unstructured or structured
► Data miner might be an individual with little or no programming skills.
Sophisticated tools are used to ease the data extraction process.
► Creative thinking helps to make sense out of the findings
► Data mining tools can be easily integrated with spreadsheets and other software
development tools
► Parallel processing is required to mine large data sets for analysis.
Source: Sharda et.al (2018)
Source: Sharda et.al (2018)
Data Mining process
1. CRISP-DM

Source: Sharda et.al (2018)


Data Preprocessing steps

Source: Sharda et.al (2018)


Overview of CRISP-DM

Source:https://www.youtube.com/watch?v=ar2J0pX0T3M
2. SEMMA

Source: Sharda et.al (2018)

You might also like