Data Science

What is Data
Science and why

is it important?
1
Speaker
Angel Sevilla Camins

Chapter Lead Data Science
@ Eraneos Netherlands
2
What is Data Science?
3
What is Data Science?
Definition:
Data science combines the scientific
method, math and statistics, specialized
programming, advanced analytics, AI,
and even storytelling to uncover and
explain the business insights buried in
data.
Source: https://www.ibm.com/cloud/learn/data-science-introduction & https://ge.iitm.ac.in/I2MP/data-science/

4
Data science encompasses:
Preparing data for analysis and processing
Performing advanced data analysis using complex algorithms -> MODELS
Transforming patterns into predictions that support business decision making
Validating the results through scientifically designed tests and experiments
Presenting the results in visualizations that enable stakeholders to draw informed

conclusions
5
Data Science: ML VS AI
6
Machine Learning (ML) is a subset of
Artificial Intelligence (AI)
Source: https://www2.deloitte.com/nl/nl/pages/data-analytics/articles/part-1-artificial-intelligence-defined.html
7
What is Machine learning?
Definition:
Machine Learning (ML) provides systems
the ability to automatically learn and
improve from experience and/or from Machine Learning
past cases, without being explicitly
programmed.
Machine learning focuses on the

development of computer programs that
can access data and use it to learn for
themselves.
However, if there is no past then no
predictions can be made
Source: https://www.expert.ai/blog/machine-learning-definition/
8
How Machine Learning Works?
We want a computer to perform tasks for us. But it is better if we can show the computer
We can program it by hand … examples from which it can learn by itself
if pixel1 == ‘white’:
if pixel2 == ‘black’:
…
if pixel256 == ‘brown’:
return ‘dog’
Source: https://marutitech.com/artificial-intelligence-and-machine-learning/#What_is_Machine_Learning
9
Artificial Intelligence (AI)
Alan Turing, defines this discipline as:
“AI is the science and engineering of

making intelligent machines, especially
intelligent computer programs.”
Challenges:
• Machine learning
• Computer Vision
• Natural language processing (NLP)
• Robotics & Motion
• Planning and optimization
• Knowledge capture
Sources: https://softengi.com/projects/ai-pharma-defects-identification-at-production-line/
https://www.checkhub.io/category/artificial-intelligence/
https://www2.deloitte.com/nl/nl/pages/data-analytics/articles/part-1-artificial-intelligence-defined.html
10
Computer Vision
According to Prof. Fei-Fei Li:
“A subset of mainstream artificial intelligence that deals with the science of making
computers or machines visually enabled, i.e., they can analyze and understand an image.”
Sources: https://en.wikipedia.org/wiki/Computer_vision
11
Computer Vision
Applications:
• Facial Recognition
Computer vision also plays an
important role in facial recognition
applications, the technology that
enables computers to match images of
people’s faces to their identities.
• Augmented Reality
Augmented reality is the technology
that enables computing devices such as
smartphones, tablets and smart glasses
to overlay and embed virtual objects on
real world imagery.
Sources: https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer-vision-heres-a-look-why-it-s-so-awesome-e8a58dfb641e
https://www.internationalairportreview.com/news/111201/facial-recognition-klia/
https://www.forbes.com/sites/theyec/2019/02/06/augmented-reality-in-business-how-ar-may-change-the-way-we-work/
12
Computer Vision
Applications:
• Self-Driving Cars
Computer vision enables self-driving cars
to make sense of their surroundings. The
self-driving car can then steer its way on
streets and highways, avoid hitting
obstacles, and (hopefully) safely drive its
passengers to their destination.
• Image processing biomedicine

Computer vision algorithms can help
automate tasks such as detecting
cancerous moles in skin images or finding
symptoms in x-ray and MRI scans.
Sources: https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer-vision-heres-a-look-why-it-s-so-awesome-e8a58dfb641e
https://www.roboticsbusinessreview.com/unmanned/consumer-acceptance-of-self-driving-cars-soars-study-says
https://vision.in.tum.de/research/biomed
13
Natural Language Processing
Natural language processing (NLP) is a subfield of artificial intelligence (AI) and deals with
how to program computers to process and analyze large amounts of natural
language data:
• Naturalpredict
Languagegiven a center word
Processing (NLP)the mosthow
studies likely words inunderstand
machines a Bxed sizedhuman
language.
window around it (Skip-Gram).
• Its goal is to build systems that can make sense of text and perform tasks like
translation, grammar checking, or text generation.
Sources: https://en.wikipedia.org/wiki/Natural_language_processing
https://towardsdatascience.com/word2vec-to-transformers-caf5a3daa08a
14 Word2Vec example https://monkeylearn.com/blog/nlp-ai
Computer Vision
Applications:
• Machine Translation
Machine Translation is the subfield of computer
linguistics which involves the use of software
applications to translate text or speech from one
language to another.
• Conversational User Interface

A conversational user interface is an interface for
computers that emulates a conversation with a
real human. For example, a chatbot.
• Text Prediction
Text prediction refers to the process of estimating
the next word in a phrase or sentence. One of the
popular and common examples of text prediction
is Google Search, BERT (Bidirectional Encoder
Representations from Transformers) and ChatGPT
(generative, pre-trained transformer)
Sources: https://insights.daffodilsw.com/blog/7-interesting-applications-of-natural-language-processing-nlp
https://medium.com/voice-tech-podcast/build-a-chatbot-using-c-and-dialogflow-93b50be39d7c
https://www.codemotion.com/magazine/dev-hub/machine-learning-dev/bert-how-google-changed-nlp-and-how-to-benefit-from-this/
15
Real life Data Science Projects
16
Apotheek voorzorg
Built a Digital Twin and an AI Optimization Algorithm:
Use case discovery phase

▪ Optimizing pill packing machine configuration in production line
Quick scan phase
▪ Built a Digital Twin of whole production chain to run simulations
and assess impact
▪ Built an Optimization Algorithm that was able to perform better
than baseline.
Proof of value phase
▪ Demonstrated 14%-point increase in efficiency, converting to E50k
yearly savings per %-point. 500k savings per year
▪ Delivered a Planner Module to automatically allocate right
prescriptions to right machine at the right time to increase
efficiency
Integration phase
▪ Built production ready app on Azure
17
Bridgestone
Scalable Data Hub:
We built a scalable Data Hub with a ‘sandbox environment’ allowing

different teams within Bridgestone to automatically request and
receive their own isolated environment where they can experiment,
log data, and build use cases that are easily deployed.
▪ Rolled-out an Azure Data Hub within 2 months
▪ Able to store data on premise and send it in batches to the Cloud for
more security
▪ Build a tyre quality prediction model as an Applied AI use case
Components within the Bridgestone Data Hub:
✓ Data Warehousing
✓ Deployed using MLFlow within Databricks
✓ CI/CD
18
Reckitt
Scalable Data Hub:
We built an Azure Data Hub that ingests product data from each
region and country and each month creates a forecasting model per
product. More than 30,000 forecasting models are created
automatically on a monthly basis.
▪ Product data is added into advanced dashboards and can be

analyzed on country, region, market, and product levels
▪ We built a sophisticated dynamic forecasting solution able to

predict the performance of each product segment within every
market up to 3 years in the future
▪ Models are equipped with a Covid impact regressor for high

accuracy
19
MLops
20
MLops
Elements for ML systems:
Source: “Hidden Technical Debt in Machine Learning Systems” Google NIPS 2015
21
MLops
ML Cycle:
Data
Deployment Model
22
MLops
23
DevOps
Definitions:
• DevOps = Dev (Development) + Ops (Operations)

• DevOps is a set of practices that combines:
• Software development (Dev)
• IT operations (Ops).
• It aims to shorten the systems development life cycle and provide continuous delivery
with high software quality
24 Source: https://en.wikipedia.org/wiki/DevOps
DevOps
CI/CD:
• Continuous integration (CI) is the practice of automating the integration of code

changes from multiple contributors into a single software project.
• Continuous deployment (CD) is a strategy in software development where code

changes to an application are released automatically into the production environment.
25 Source: https://en.wikipedia.org/wiki/DevOps
MLOps
• An ML system is a software system, so
similar practices apply to help guarantee
that you can reliably build and operate ML
systems at scale.
• However, in ML, there are a few notable

differences:
• CI is no longer only about testing and
validating code and components, but
also testing and validating data, data
schemas, and models.
• CD is no longer about a single

software package or a service, but a
system (an ML training pipeline) that
should automatically deploy another
service (model prediction service).
Source: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
26
Explainable AI
27
Explainable AI
AI is already making decisions in several
fields:
• Healthcare
• Automobile
• Banking and Finance
• Surveillance
• Social Media
• Entertainment
• Education
• Space Exploration
• Gaming
• Robotics
• Agriculture
• E-Commerce
28
Explainable AI
• AI is already making decisions in several fields
and it is wrong sometimes.
• The current generation of AI systems offer

tremendous benefits, but it must explain its
decisions and actions to users.
• Users should understand, appropriately trust, and

effectively manage AI.
29
Explainable AI
Need for explainability in AI:
• Models with high sensitivity
• Attacks
• Fairness
• Legal and regulatory concerns:
• GDPR
• Algorithmic Accountability
Act 2019
30
Explainable AI
Explainable Artificial Intelligence (XAI) aims to:
• Produce more explainable models, while maintaining a high level of learning

performance (prediction accuracy)
• Enable human users to understand, appropriately trust, and effectively manage the
emerging generation of artificially intelligent partner
https://www.darpa.mil/program/explainable-artificial-intelligence
31
Explainable AI
SHAP
32
Explainable AI
Shapley values
• The Shapley value is a method for assigning payouts to players depending on their
contribution to the total
• Applying that to ML we define that:
• Feature is a “player” in a game

• Prediction is the “payout”
• Shapley value tells us how the “payout” (feature contribution) can be distributed among
features
https://towardsdatascience.com/shap-explained-the-way-i-wish-someone-explained-it-to-me-ab81cc69ef30
33
Thank you!
34

Data Science

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science

Uploaded by

Copyright:

Available Formats

What is Data

Science and why

Angel Sevilla Camins

Source: https://www.ibm.com/cloud/learn/data-science-introduction & https://ge.iitm.ac.in/I2MP/data-science/

Preparing data for analysis and processing

Performing advanced data analysis using complex algorithms -> MODELS

Transforming patterns into predictions that support business decision making

Validating the results through scientifically designed tests and experiments

Presenting the results in visualizations that enable stakeholders to draw informed

Machine learning focuses on the

“AI is the science and engineering of

• Image processing biomedicine

• Conversational User Interface

Use case discovery phase

We built a scalable Data Hub with a ‘sandbox environment’ allowing

▪ Rolled-out an Azure Data Hub within 2 months

▪ Build a tyre quality prediction model as an Applied AI use case

Components within the Bridgestone Data Hub:

✓ Deployed using MLFlow within Databricks

▪ Product data is added into advanced dashboards and can be

▪ We built a sophisticated dynamic forecasting solution able to

▪ Models are equipped with a Covid impact regressor for high

• DevOps = Dev (Development) + Ops (Operations)

• Continuous integration (CI) is the practice of automating the integration of code

• Continuous deployment (CD) is a strategy in software development where code

• However, in ML, there are a few notable

• CD is no longer about a single

• The current generation of AI systems offer

• Users should understand, appropriately trust, and

• Models with high sensitivity

• Legal and regulatory concerns:

• Produce more explainable models, while maintaining a high level of learning

• Applying that to ML we define that:

• Feature is a “player” in a game

You might also like