You are on page 1of 17

Introduction to Data Science

and Machine Learning


LECTURE 1
Definitions

Data science: a
field that works with
and analyzes large
amounts of data to
provide meaningful
information that can
be used to make
decisions and solve
problems.
Definitions
Definitions
Definitions
 Data set: a collection of data, particularly one that is specifically
structured. They can be small and simple to work with or large and
complex. 

 Data mining: A process that data scientists employ to find usable


models and insights in data sets.

 Big data: big data is a term that describes data that is very large.
Conventional tools like Excel can be unable to handle big data.
Definitions
 Algorithm: A series of repeatable steps,
usually expressed mathematically, to
accomplish a specific data science task or
solve a problem.
 An algorithm is a set of instructions. In
computer programming, a function is an
implementation of an algorithm.
Definitions

Artificial intelligence: the theory and


development of computer systems able to perform
tasks normally requiring human intelligence, such
as visual perception, speech recognition, decision-
making, and translation between languages.
Open Your Phone With Face ID
Definitions

 Machine Learning: The computational process wherein a machine


“learns” and adjusts its behaviors based on feedback from data, based on
an adaptable algorithm.
 Machine learning helps computers predict outcomes without direct
human input.
 Machine-learning algorithms find and apply patterns in data using
statistics. And they pretty much run the world.
AI and Machine Learning

Artificial Intelligence is the broader


concept of machines being able to carry
out tasks in a way that we would consider
“smart”.
And,
Machine Learning is a current application
of AI based around the idea that we should
really just be able to give machines access
to data and let them learn for themselves.
Back to Data Science

 Data science is an inter-disciplinary


field that uses scientific methods,
processes, algorithms and systems to
extract knowledge and insights from
many structural and unstructured data.
 Data science is related to data mining,
machine learning and big data.
 Data science includes work in
computation, statistics, analytics, data
mining, and programming.
 An important part of a data scientist’s
job is his or her ability to recognize an
algorithm’s suitability for certain tasks,
as it’s impossible to rely on one
algorithm as a panacea to all problems
Data Science Process

This is where
machine
learning comes
in

https://www.kdnuggets.com/2016/03/data-science-process.html
Data Science Process (S1)
 Step 1: Frame the problem
 The first thing you have to do before you solve a problem is to define exactly
what it is.
 You need to be able to translate data questions into something actionable.
 You’ll have to develop the intuition to turn scarce inputs into actionable
outputs–and to ask the questions that nobody else is asking.
 You should start by understanding the goals and the underlying why behind
their data questions. Before you can start thinking of solutions, you’ll want to
work with them to clearly define the problem.
Ex. Studying the Sales You should ask questions like the following:
for a Product • Who are the customers?
• Why are they buying our product?
• How do we predict if a customer is going to buy our product?
• What is different from segments who are performing well and
those that are performing below expectations?
• How much money will we lose if we don’t actively sell the
product to these groups?
Data Science Process (S2)

 Step 2: Collect the raw data needed for your problem


 Once you’ve defined the problem, you’ll need data to give you the insights
needed to turn the problem around with a solution.
 This part of the process involves thinking through what data you’ll need and
finding ways to get that data
 Data can be from internal databases or purchasing external datasets.
 DATA COSTS MONEY!!

https://www.questionpro.com/blog/data-collection-methods/
Data Science Process (S3)
 Step 3: Process/prepare the data for analysis
 Now you have raw data that needs to be processed before you can do any
analysis.
 Data can be quite messy, especially if it hasn’t been well-maintained.
 You’ll see: values set to null though they really are zero, duplicate values, and
missing values.
You’ll want to check for the
following common errors:

• Missing values (ex. customers


without an initial contact date)
• Corrupted values, such as invalid
entries
• Systematic differences (ex.
Timezone)
• Date range errors, perhaps you’ll
have dates that makes no sense,
such as data registered from
before sales started
Data Science Process (S4)

 Step 4: Explore the data


 You’ll have to look at some of the most
interesting patterns that can help explain why
certain things happen.
 Here you can begin to trace patterns you can
analyze more deeply.
Data Science Process (S5)

 Step 5: Perform in-depth analysis


 This step you apply statistical, mathematical and technological knowledge
 This is where machine learning comes in for us in this course
Data Science Process (S6)

 Step 6: Communicate results of the analysis


 Explain why the insights you’ve uncovered are important.

 Proper communication will mean the difference between action and inaction on


your proposals.
 Not everyone in the company will know the math but they all need to know
why it’s a good model and how to use it

You might also like