You are on page 1of 13

Week 1: Business and Data Understanding

Unit 3: Introduction to Project Methodologies


Introduction to Project Methodologies
Why should there be a project methodology?

▪ The data science process must be reliable and


repeatable by people with little data science
background. TIME
▪ A project methodology:
– Provides a framework for recording experience
Task 1
– Allows projects to be replicated
– Provides an aid to project planning and management
Task 2
– Is a “comfort factor” for new adopters
– Reduces dependency on “stars” Task 3
▪ Ultimately, the methodology must support the effective
integration of data science into the organization. Task 4

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Introduction to Project Methodologies
Cross-industry standard process for data mining (CRISP-DM)

Business Data
Understanding Understanding

Data
Preparation

Deployment

Data Modeling

Evaluation

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Introduction to Project Methodologies
CRISP-DM – Phase 1: Business Understanding

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Business
Determine Business Business
Background Success
Objectives Objectives
Criteria

Requirements,
Assess Inventory of Risks & Costs &
Assumptions, & Terminology
Situation Resources Contingencies Benefits
Constraints

Data Science
Determine Data Data Science
Success
Science Goals Goals
Criteria

Initial
Key
Produce Project Assessment TASKS
Project Plan of Tools &
Plan
Techniques
OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Introduction to Project Methodologies
CRISP-DM – Phase 2: Data Understanding

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Initial Data
Collect Initial
Collection
Data
Report

Data
Describe
Description
Data
Report

Data
Explore
Exploration
Data
Report
Key
Verify Data Data Quality TASKS
Quality Report
OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Introduction to Project Methodologies
CRISP-DM – Phase 3: Data Preparation

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Dataset Dataset Description

Rationale for
Select Data
Inclusion/Exclusion

Data Cleaning
Clean Data
Report

Construct Data Derived Attributes Generated Records

Integrate Data Merged Data Key


TASKS
Format Data Reformatted Data
OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Introduction to Project Methodologies
CRISP-DM – Phase 4: Modeling

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Select Modeling Modeling


Modeling Technique
Technique Assumptions

Generate Test
Test Design
Design

Build
Parameter Settings Models Model Description
Model
Key

Revised Parameter TASKS


Assess Model Model Assessment
Settings
OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


Introduction to Project Methodologies
CRISP-DM – Phase 5: Evaluation

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Evaluate Assessment of Data


Approved Model
Results Mining Results

Review
Review of Process
Process

Determine List of Possible


Decision
Next Steps Actions
Key
TASKS

OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Introduction to Project Methodologies
CRISP-DM – Phase 6: Deployment

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Plan
Deployment Plan
Deployment

Plan Monitoring & Monitoring


Maintenance Maintenance Plan

Produce
Final Report Final Presentation
Final Report
Key

Review Experience TASKS


Project Documentation
OUTPUTS

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Monitoring and Maintenance
CRISP-DM – Update

Business Data
Understanding Understanding

Data
Monitoring Preparation

Deployment Data Modeling

Evaluation

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Introduction to Project Methodologies
Summary

▪ This unit has introduced you to the most popular


project methodology for data science – CRISP-DM.
▪ There are 6 key phases, and each phase includes a
number of tasks and outputs.
▪ It is very important for you to follow a project
methodology when you are working on a data
science project, so that you understand the order of
the phases and each of the tasks you must consider.
▪ Different data science projects will have different
requirements, so you could use this methodology as
a template to ensure you have considered all of the
different aspects specific to your project.

© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

You might also like