Professional Documents
Culture Documents
Data science: a
field that works with
and analyzes large
amounts of data to
provide meaningful
information that can
be used to make
decisions and solve
problems.
Definitions
Definitions
Definitions
Data set: a collection of data, particularly one that is specifically
structured. They can be small and simple to work with or large and
complex.
Big data: big data is a term that describes data that is very large.
Conventional tools like Excel can be unable to handle big data.
Definitions
Algorithm: A series of repeatable steps,
usually expressed mathematically, to
accomplish a specific data science task or
solve a problem.
An algorithm is a set of instructions. In
computer programming, a function is an
implementation of an algorithm.
Definitions
This is where
machine
learning comes
in
https://www.kdnuggets.com/2016/03/data-science-process.html
Data Science Process (S1)
Step 1: Frame the problem
The first thing you have to do before you solve a problem is to define exactly
what it is.
You need to be able to translate data questions into something actionable.
You’ll have to develop the intuition to turn scarce inputs into actionable
outputs–and to ask the questions that nobody else is asking.
You should start by understanding the goals and the underlying why behind
their data questions. Before you can start thinking of solutions, you’ll want to
work with them to clearly define the problem.
Ex. Studying the Sales You should ask questions like the following:
for a Product • Who are the customers?
• Why are they buying our product?
• How do we predict if a customer is going to buy our product?
• What is different from segments who are performing well and
those that are performing below expectations?
• How much money will we lose if we don’t actively sell the
product to these groups?
Data Science Process (S2)
https://www.questionpro.com/blog/data-collection-methods/
Data Science Process (S3)
Step 3: Process/prepare the data for analysis
Now you have raw data that needs to be processed before you can do any
analysis.
Data can be quite messy, especially if it hasn’t been well-maintained.
You’ll see: values set to null though they really are zero, duplicate values, and
missing values.
You’ll want to check for the
following common errors: