You are on page 1of 9

Introduction

Swapnil Sakpal
Agenda
• To analyze and predict the house rates based
on existing house rates
Overview of CRISP-DM
• CRISP-DM (Cross-Industry Standard Process
for Data Mining) is a widely used
methodology for guiding the data mining
process. It provides a structured approach to
tackle data mining projects and has become a
de facto standard in the industry. CRISP-DM
consists of six main phases, each with specific
tasks and objectives:
Business Understanding
• In this initial phase, the project objectives and
requirements are defined. The focus is on
understanding the business problem, goals,
and constraints that drive the data mining
project.
Data Understanding
• This phase involves collecting and exploring
the available data. Data sources are identified,
data quality is assessed, and initial insights are
gained through data exploration techniques.
The goal is to become familiar with the data
and understand its structure and content.
Data Preparation
• In this phase, the data is cleaned, transformed,
and preprocessed to prepare it for modeling.
Tasks include handling missing values,
removing outliers, selecting relevant variables,
and creating derived attributes. This phase is
crucial as the quality and suitability of the data
greatly impact the accuracy and effectiveness
of the models.
Modeling
• The modeling phase focuses on selecting and
applying appropriate data mining techniques to
build predictive or descriptive models.
Different algorithms and approaches are
explored, and model parameters are fine-
tuned. The performance of the models is
evaluated using appropriate evaluation
metrics.
Evaluation
• The models developed in the previous phase
are evaluated in terms of their quality and
effectiveness. The results are compared against
the project objectives and success criteria. If
the models do not meet the requirements, the
process may loop back to previous phases for
further refinement.
Deployment
• In this final phase, the selected models are
deployed into the operational environment.
This may involve integrating them into
existing systems, creating user interfaces, or
developing reports. Monitoring and
maintenance plans are established to ensure
the models continue to perform effectively
over time.

You might also like