0% found this document useful (0 votes)
602 views12 pages

Data Science Methodolgy

This document outlines a general data science methodology that provides a guiding strategy for data science processes and activities. It describes each step of the methodology including business understanding, analytic approach, data requirements, data collection/preparation, modeling, evaluation, deployment, and feedback.

Uploaded by

Mohammed Adnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
602 views12 pages

Data Science Methodolgy

This document outlines a general data science methodology that provides a guiding strategy for data science processes and activities. It describes each step of the methodology including business understanding, analytic approach, data requirements, data collection/preparation, modeling, evaluation, deployment, and feedback.

Uploaded by

Mohammed Adnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Foundational Data Science Methodology: Introduces the foundational concepts and importance of data science methodologies in modern analytics.
  • Introduction: Explains the interest in data science and outlines rapidly evolving technologies related to the field.
  • Data Science Methodology: Defines the data science methodology and its purpose in guiding strategies across domains.
  • Methodology Diagram: Visual representation of the data science process, illustrating steps from business understanding to feedback.
  • Business Understanding: Details the initial phase of projects focusing on defining objectives and requirements from a business perspective.
  • Analytic Approach: Describes the approach to solving business problems using statistical and machine learning techniques.
  • Data Compilation: Focuses on determining data requirements and initial data collection processes.
  • Data Preparation: Covers activities related to constructing data sets, including data cleaning and feature engineering.
  • Modeling: Explores the development of predictive and descriptive models, refining data preparation and model specification.
  • Model Evaluation: Discusses the process of evaluating model quality and ensuring it addresses business problems.
  • Deployment and Feedback: Information on deploying models to production environments and gathering feedback for improvement.
  • Ongoing Value Through Good Methodology: Highlights the importance of iterative problem-solving and continuous improvement in data science.

John B. Rollins, Ph.D.

IBM Analytics | IBM Corporation

Foundational Data Science Methodology

2015 IBM Corporation

Introduction
Why we are interested in data science
- Solve problems and answer questions
- Gain useful insights through modeling to predict outcomes or discover
underlying patterns

Rapidly evolving technologies


- Platform growth
- In-database analytics
- Text analysis
- Automation

2015 IBM Corporation

Data science methodology


Why?
- To provide a guiding strategy

What?
- General strategy that guides the processes and activities within a given
domain
- Does not depend on particular technologies or tools
- Not a set of techniques or recipes
- Provides the data scientist with a framework for how to proceed to obtain
answers

2015 IBM Corporation

Methodology diagram
Business
Understanding

Analytic
Approach

Data
Requirements

Feedback

Data Collection

Deployment

Data
Understanding

Evaluation

Modeling

Data
Preparation

2015 IBM Corporation

Business understanding
Business
Understanding

Every project begins with business understanding.


- Clearly define project objectives and requirements from the business
perspective key to a successful solution
- Business sponsors most critical in this stage
Define problem and solution requirements
- Business sponsors involved throughout the project
Provide domain expertise
Review intermediate findings
Ensure that the work generates the intended solution

2015 IBM Corporation

Analytic approach
Analytic
Approach

With a clear definition of the business problem, we define the analytic


approach to solving the problem.
- Express problem in context of statistical and machine learning techniques
- Identify suitable technique(s)
- Examples
Classification to predict response to a promotion ("yes" or "no)
Clustering and Associations for customer segmentation and market basket
analysis

2015 IBM Corporation

Data compilation
The chosen analytic approach determines the
data requirements.
- Content, formats, representations

Initial data collection is performed.


- Available data resources (structured, unstructured,
semi-structured) relevant to the problem domain
- Decide whether to obtain less-accessible data
elements
- Revise data requirements or collect more data,
if needed

Data
Requirements

Data Collection

Data
Understanding

Then data understanding is gained.


- Descriptive statistics and visualization
- Content, quality, initial insights about data
- Additional data collection to fill gaps, if needed
7

2015 IBM Corporation

Data preparation
Data preparation encompasses all activities to construct the data set.
- Data cleaning
Missing or invalid values
Eliminating duplicate rows
Formatting properly
- Combining multiple data sources
- Transforming data
- Feature engineering
- Text analysis

Accelerate data preparation by


automating common steps

Data
Preparation

2015 IBM Corporation

Modeling
Modeling focuses on developing models.
- Predictive or descriptive models
- According to the previously-defined analytic approach
- Training set for predictive modeling

Highly iterative process


- Intermediate insights refinements in data preparation & model specification
- Multiple algorithms & parameters to find best model for a given technique

Modeling

2015 IBM Corporation

Model evaluation
Model evaluation is performed during model development and before
model deployment.
- Understand the models quality
- Ensure that it properly addresses the business problem

Diagnostic measures
- Suitable to the modeling technique used
- Testing set
- Refine model as needed
Evaluation

Statistical significance tests

10

2015 IBM Corporation

Deployment and feedback


Once finalized, the model is deployed into a production environment.
- May be in a limited / test environment until model is proven
- Involves additional groups, skills, and technologies
Solution owner
Feedback

Marketing
Application developers
IT administration

Deployment

Feedback to assess model performance


- Gathering and analysis of feedback for assessment
of the models performance and impact
- Iterative process for model refinement and redeployment
- Accelerate through automated processes
11

2015 IBM Corporation

Ongoing value through good methodology


Methodology diagram illustrates the iterative nature of problem-solving in
a data science project.
Through feedback, refinement, and redeployment, models are continually
improved and adapted to evolving conditions.
The model continues to provide value to the organization for as long as
the solution is needed.

12

2015 IBM Corporation

© 2015 IBM Corporation 
Foundational Data Science Methodology 
John B. Rollins, Ph.D.  
IBM Analytics | IBM Corporation
© 2015 IBM Corporation 
2 
Introduction 
§ Why we are interested in data science 
-  Solve problems and answer questions 
- 
© 2015 IBM Corporation 
3 
Data science methodology  
§ Why? 
-  To provide a guiding strategy 
§ What? 
-  General strateg
© 2015 IBM Corporation 
4 
Methodology diagram 
Business 
Understanding 
Data 
Understanding 
Data 
Preparation 
Analytic 
Ap
© 2015 IBM Corporation 
5 
Business understanding 
§ Every project begins with business understanding. 
-  Clearly define pr
© 2015 IBM Corporation 
6 
Analytic approach 
§ With a clear definition of the business problem, we define the analytic 
app
© 2015 IBM Corporation 
7 
Data 
Understanding 
Data 
Requirements 
Data Collection 
Data compilation 
§ The chosen analytic
© 2015 IBM Corporation 
8 
Data preparation 
§ Data preparation encompasses all activities to construct the data set.  
-  D
© 2015 IBM Corporation 
9 
Modeling 
Modeling 
§ Modeling focuses on developing models. 
-  Predictive or descriptive models
© 2015 IBM Corporation 
10 
Model evaluation 
§ Model evaluation is performed during model development and before 
model dep

You might also like