You are on page 1of 5

Running Head: DATA MINING 1

Week 5 Management Support Systems

Zhe shen

MCAP351 Management Support Systems

Professor Dennis Hermann


1. What are the major data mining processes?

Numerous data mining processes have been identified and these include: KDD, SEMMA,

as well as CRISP-DM.

2. Why do you think the early phases (understanding of the business and

understanding of the data) take the longest in data mining projects?

The early phases are the longest stages in data mining phases because they primarily

involve learning. Learning and understanding occurs in these phases and as a result they

cannot be automated. A lot of time must be taken to understand the business as well as

data as any mistakes made at these stages affects the entire data mining projects (Olson,

Shi & Shi, 2007).

3. List and briefly define the phases in the CRISP-DM process.

Business understanding

This phase primarily involves understanding the business objectives, the current situations

including the available resources, constraints, as well as assumptions. This phase also involves

developing a data mining plan as well as data mining objectives (Turban, Sharda & Delen,


Data understanding

This phase involves collection of available data to familiarize with the data and analyze gross as

well as surface aspects of the data. The quality of data is examined to determine cases of missing

data (Olson, Shi & Shi, 2007).

Data preparation

Data preparation takes most of the project’s time and the outcome constitutes the final data set.

After the identification of available data sources, the sources are constructed as well as formatted

as required.


In this phase, the modeling techniques are selected and validated. Additional models are then

obtained by running the modeling tool in the obtained dataset. The models are then examined to

ensure they meet business initiatives (Olson, Shi & Shi, 2007).


The model results are then evaluated in the perspective of the business objectives identified

during the first phase. The decision to proceed or not is then made during this phase.


The information obtained from the data mining process is then presented in a manner that can be

utilized by the stakeholders. The final report generated during this phase should provide an

overview of the experience during the data mining process and identify areas of improvement.

4. What are the main data preprocessing steps? Briefly describe each step and provide

relevant examples.

Data integration: entails gathering, selecting, as well as channeling of information. This step

coordinates various databases to identify irregularities as well as redundancies (Han, Pei &

Kamber, 2011).

Data cleaning: this step involves cleaning the information to identify missing qualities, determine

irregularities, as well as eliminate anomalies. For example, fixing blunders as well as missing


Data transformation: involves standardizing the information as well as introducing new traits.

For instance this step is identified with accumulation as well as standardization (Olson, Shi &

Shi, 2007).

Data reduction: entails creating lower volume of data while maintaining the same explanatory

results. Achieved for instance by equalizing skewed information or minimizing number of traits

as well as records.

5. How does CRISP-DM differ from SEMMA?

There are numerous variations between SEMMA and CRISP-DM. SEMMA was not designed

for the general business environment as is the case with CRISP-DM that is applicable across

various data mining tools in the industry. SEMMA was primarily developed for SAS Enterprise

Miner. CRISP-DM constitutes a six phase model that proceeds from start to finish in the data

mining process (Olson, Shi & Shi, 2007). On the contrary SEMMA specifically focuses on SAS

Enterprise Miner as well as model development overlooking the initial stages that are contained

in the CRISP-DM and completely eliminates the Deployment phase.


Turban, E., Sharda, R., & Delen, D. (2015). Decision support and business intelligence systems.

Pearson Education India.

Olson, D. L., Shi, Y., & Shi, Y. (2007). Introduction to business data mining (Vol. 10, pp. 2250-

2254). Englewood Cliffs: McGraw-Hill/Irwin.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.