Professional Documents
Culture Documents
Meeting 12 Business Understanding and Data Collection 1
Meeting 12 Business Understanding and Data Collection 1
Data Collection
PERTEMUAN XII
© IBM 2020
Foundational Methodology for Data Science
© IBM 2020
Business
Understanding
© IBM 2020
Business
Understanding
© IBM 2020
Analytic Approach
Once the business problem has been clearly stated, the data scientist can
define the analytic approach to solving the problem
This stage entails expressing the problem in the context of statistical and
machine-learning techniques, so the organization can identify the most
suitable ones for the desired outcome
In brief, analytic approach is how to express problem in context of
statistical and machine learning techniques
© IBM 2020
Analytic Approach
© IBM 2020
Data Requirements
© IBM 2020
Data Collection
In the initial data collection stage, data scientists identify and gather the
available data resources—structured, unstructured and semi-structured—
relevant to the problem domain.
© IBM 2020
Gathering Data
© IBM 2020
Open Data
© IBM 2020
Open Data
© IBM 2020
Public Data vs Open
Data
© IBM 2020
Open Data Sources
There are several free Open Data sources anyone can use, such as:
© IBM 2020
IBM Data Asset
eXchange (DAX)
Online hub for developers and data scientists to find carefully curated free
and open datasets under open data licenses.
While there are many resources available online for finding open datasets,
DAX is unique in its high level of quality and curation.
An example of the sorts of datasets we’re releasing is the Finance
Proposition Bank and Contracts Proposition Bank datasets. These
datasets are part of an active research program from IBM Research.
© IBM 2020
IBM Data Asset
eXchange (DAX)
© IBM 2020
Practice
© IBM 2020
Practice
© IBM 2020
Practice – Topic 1
© IBM 2020
Practice – Topic 2
Restaurant Revenue Prediction
With over 1,200 quick service restaurants across the globe, TFI is the company behind some
of the world's most well-known brands: Burger King, Sbarro, Popeyes, Usta Donerci, and
Arby’s. They employ over 20,000 people in Europe and Asia and make significant daily
investments in developing new restaurant sites.
Right now, deciding when and where to open new restaurants is largely a subjective process
based on the personal judgement and experience of development teams. This subjective
data is difficult to accurately extrapolate across geographies and cultures.
Discuss the business
problem and solution New restaurant sites take large investments of time and capital to get up and running. When
that you can provide the wrong location for a restaurant brand is chosen, the site closes within 18 months and
with the dataset operating losses are incurred.
Finding a mathematical model to increase the effectiveness of investments in new restaurant
sites would allow TFI to invest more in other important business areas, like sustainability,
innovation, and training for new employees. Using demographic, real estate, and commercial
data, this competition challenges you to predict the annual restaurant sales of 100,000
regional locations.
Download the dataset here
© IBM 2020
Practice – Topic 3
© IBM 2020
References
© IBM 2020
Thank You
© IBM 2020
©Copyright IBM Corporation 2020. All rights reserved. The information contained in these materials is provided for informational purposes only,
and is provided AS IS without warranty of any kind, express or implied. Any statement of direction represents IBM’s current intent, is subject to
change or withdrawal, and represents only goals and objectives. IBM, the IBM logo, and other IBM products and services are trademarks of the
International Business Machines Corporation, in the United States, other countries or both, Other company, product, or service names may be
trademarks or service marks of others
© IBM 2020