You are on page 1of 106

INC362: Introduction to Data Science for

Automation Systems

10/08/2023
Dr. Teema Leangarun

Faculty of Engineering, Department of Control Systems and Instrumentation Engineering


King Mongkut’s University of Technology Thonburi

INC492 INTRODUCTION TO DATA SCIENCE 1


INC362: Introduction to Data Science for Automation Systems

• Course name: Introduction to Data Science for Automation Engineering (3 Credits)

• Date/Time: Thursday / 09.30 - 12.30

• Room No.: CB40609 and/or ONLINE by Zoom

• Lecturer:
• Dr. Teema Leangarun
• Office Hours: By appointment
• E-mail: teema.lea@kmutt.ac.th
• Facilitator: Chanapat Pramoulsilpchai (P’ Boon)
• Pre-requisites: INC242, INC261 or According to the teacher's opinion
• Requirement: All students must bring their own private laptop to class.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 2


Course Learning Outcomes (CLOs)

• CLO1: Develop programs to analyze data with intelligent techniques.

• CLO2: Deliver analytical and predictive provided data (Data Visualization & Data Analytics)

• CLO3: Work together as a team project to create a multidisciplinary data presentation and analyze it
using intelligent techniques.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 3


Grading rubrics

Level 5 Level 4 Level 3 (expected) Level 2 Level 1


Performance
CLOs Excellent Good Average Fair Poor/Minimal pass
Indicators (PI)
(>= 80% to 100%) (>= 70% to 79%) (>= 60% to 69%) (>= 50% to 59%) (< 50%)

CLO1: Develop PI 1.1 Gathering, Be able to Be able to Be able to Be able to partially Minimally and
programs to Wrangling, gather/manage/clean/tra gather/manage/clean/tr gather/manage/clean/tra gather/manage/clean/tr inaccurately
analyze data Transformation, nsform data completely ansform data nsform data accurately ansform data from one gather/manage/clean/t
with intelligent Organize, Clean, and and accurately from one completely or from one form to form to another. ransform data from
techniques. Curate data form to another respect accurately from one another with some one form to another.
to the provided problem form to another errors
(data management) regarding the given
problem

CLO2: Deliver PI 2.1 Building Be able to use various Be able to use various Be able to use some Be able to select an Be able to explain the
analytical and machine learning machine learning machine learning machine learning appropriate machine basic concepts in some
predictive data models algorithms, apply various algorithms, apply algorithms, apply some learning model to solve machine learning
for automation optimization algorithms, various optimization optimization algorithms, a given classification, algorithms
systems (Data select proper evaluation algorithms, select but incorrect the regression, or
Visualization & metrics to improve proper evaluation selection of evaluation clustering problem
Data Analytics) model performance, and metrics to improve metrics to improve
compare the model performance model performance
performance of the
models
PI 2.2 Creating data Be able to creatively Be able to present data Be able to present data Be able to minimally Does not include a
visualizations present data visualization visualization or use visualization properly present data data visualization or a
or use more advanced more basic tools to with descriptions or visualization with partial visualization with no
tools to accurately show accurately show data comparisons, per descriptions or annotation.
data relationships. relationships instructions. comparisons, per
instructions.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 4


Grading rubrics (contd.)

Level 5 Level 4 Level 3 (expected) Level 2 Level 1


Performance
CLOs Excellent Good Average Fair Poor/Minimal pass
Indicators (PI)
(>= 80% to 100%) (>= 70% to 79%) (>= 60% to 69%) (>= 50% to 59%) (< 50%)

CLO3: Work PI 3.1 Interpreting data Be able to accurately interpret Be able to accurately interpret Be able to interpret meaning to Be able to interpret meaning Inaccurately provide meaning
together as a team insights of data, make meaning to data, make data, make inferences and to data, make inferences and to data, make inferences and
project to create a inferences and predictions inferences and predictions predictions from data, or predictions from data, or predictions from data, or
multidisciplinary from data, or extract patterns from data, or extract patterns extract patterns from data with extract patterns from data extract patterns from data
data presentation from data from data some errors with many errors
and analyze it using
intelligent
techniques.
PI 3.2 Delivering oral (1) Be able to present (1) Be able to present (1) Be able to present None (1) No apparent organization.
presentation: Contents and information in logical, information in a logical information but the sequence is Evidence is not used to
Teamwork interesting sequence sequence which audience jumped around support assertions.
which audience can easily can follow.
(Undergraduate students) follow.
(2) Do not appear to have a
(2) Be able to demonstrate full (2) Be able to answer questions (2) Be comfortable with None grasp of information; cannot
knowledge and can answer and satisfactorily on most questions information but only able to answer questions about the
elaborate on most/all questions asked but fail to elaborate. answer simple questions. subject.
asked.

(3) The team run perfectly (3) The team was mostly (3) One or two members of the (3) The team did not know
coordinated, with clear coordinated, but there were group have focused most of the None when to speak, or what role
guidelines about each some moments of doubt presentation. The rest of the they were having. Only one
member’s role. Each and/or unbalance. A minority of group did not have clear person leads the group.
members have participated. the members of the group did instructions about their role.
not know what to do.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 5


Evaluation

Module 2
Module 1 Module 3 Total
Data Modeling &
Data Management Term project 100%
Model Evaluation

Homework 10 10 - 20

Term Project - - 20 20

Paper Exam 20 40 - 60

Total 30 50 20

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 6


Criterion-Referenced Grading

Cut Score >= Grade


80 A
75 B+
70 B
65 C+
60 C
55 D+
50 D

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 7


Constructive Alignment Design

LOs Teaching learning activities (TLAs) Assessments

Assessment weighting
(100%)
Week Date Teaching
Types of assessment CLO1 CLO 2 CLO 3
CLO PI Topic/Contents methods Evidence
[2] (20%) (70%) (10%)
[1]
PI 1.1 PI 2.1 PI 2.2 PI 3.1 PI 3.2
(20%) (43%) (27%) (6%) (4%)
1 11/08/23 - - Introduction to data science Lecture - - - - - - -
Discussion
Demonstration
2 18/08/23 1 1.1 Python for DS Part 1: Numpy/Pandas Lecture - - - - - - -
Discussion
Demonstration
3 25/08/23 1 1.1 Python for DS Part 2: Matplotlib/Seaborn Lecture Homework Python code 1 - 1 - -
2 2.2 Discussion
Demonstration
4 01/09/23 1 1.1 Data Wrangling and Data transform Lecture Homework Python code with 2 - 2 - -
2 2.2 Discussion interpretation
Demonstration

5 08/09/23 1 1.1 Exploratory Data Analysis (EDA) Lecture Homework Python code with 2 - 2 - -
2 2.2 Discussion interpretation
Demonstration

6 15/09/23 1 1.1 Module 1 Examination - Written exam Exam paper 13 - 7 - -


2 2.2 (Close-book exam) (Standardized exam)

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 8


Constructive Alignment Design (cont.)

LOs Teaching learning activities (TLAs) Assessments

Assessment weighting
(100%)
Week Date Teaching
Types of assessment CLO1 CLO 2 CLO 3
CLO PI Topic/Contents methods Evidence
[2] (20%) (70%) (10%)
[1]
PI 1.1 PI 2.1 PI 2.2 PI 3.1 PI 3.2
(20%) (43%) (27%) (6%) (4%)
7 22/09/23 2 2.1 Supervised learning: Regression problems Lecture Homework Python code with - 1.5 0.5 - -
2.2 (Linear, Multiple, Non-linear) Discussion interpretation
Demonstration
Problem-Based
8 29/09/23 2 2.1 Supervised learning: Classification problems 1 Lecture Homework Python code with - 1.5 0.5 - -
2.2 (Logistics regression, k-NN) Discussion interpretation
Kick-off project Demonstration
Problem-Based
9 06/10/23 2 2.1 Supervised learning: Classification problems 2 Lecture Homework Python code with - 1.5 0.5 - -
2.2 (Decision trees, SVM) Discussion interpretation
(Learn it - LEB2) Demonstration
Q&A session via Zoom Problem-Based

10 13/10/23 2 2.1 Supervised learning: Classification problems 3 Lecture Homework Python code with - 1.5 0.5 - -
2.2 (Neural networks) Discussion interpretation
(Learn it - LEB2) Demonstration
Q&A session via Zoom Problem-Based

11 20/10/23 2 2.1 - Unsupervised learning: Clustering problems Lecture Homework Azure ML block - 2 - - -
2.2 k-means, principal component analysis Discussion with interpretation
- Azure ML Demonstration
(Learn it - LEB2) Problem-Based
Q&A session via Zoom

12 27/10/23 2 2.1 Module 2 Examination - Written exam Exam paper - 30 10 - -


2.2 (Close-book exam) (Standardized exam)

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 9


Constructive Alignment Design (cont.)

LOs Teaching learning activities (TLAs) Assessments

Assessment weighting
(100%)
Week Date Teaching
Types of assessment CLO1 CLO 2 CLO 3
CLO PI Topic/Contents methods Evidence
[2] (20%) (70%) (10%)
[1]
PI 1.1 PI 2.1 PI 2.2 PI 3.1 PI 3.2
(20%) (43%) (27%) (6%) (4%)

13 03/11/23 2 2.2 Proposal presentation Project-Based Oral presentation Presentation slide - - 1 2 2


3 3.1 Clip video
3.2

14 10/11/23 2 2.2 Term project progress Project-Based Oral presentation Presentation slide - - - - -
3 3.1 Python code
3.2

15 17/11/23 1 1.1 Module 3 Final project presentation Project-Based Oral presentation Presentation slide 2 5 2 4 2
2 2.1 Complete Python
2.2 code
3.1 Final report
3
3.2

16 24/11/23 - - Reserved - - - - - - - -

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 10


Week01
Introduction to Data Science and Applications

10/08/2023
Dr. Teema Leangarun

Faculty of Engineering, Department of Control Systems and Instrumentation Engineering


King Mongkut’s University of Technology Thonburi

INC492 INTRODUCTION TO DATA SCIENCE 11


What is the Data, Information, Knowledge, Wisdom (DIKW)

https://www.researchgate.net/figure/The-data-information-knowledge-wisdom-DIKW-hierarchy-as-a-pyramid-to-manage-knowledge_fig6_332400827

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 12


What is the Data, Information, Knowledge, Wisdom (DIKW)

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 13


What does a data-driven culture really mean?

https://345.technology/technical/articles/what-does-a-data-driven-culture-really-mean/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 14


Data team

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 15


Data Team – Roles & Responsibilities

• Support Business analytics • Operational technology (OT) integration • Data science services
• Support operational analytics • Support Citizen developer (low-code • Data solution services
• Manage role-based dashboard in no-code) • Machine learning and AI
organization • Coordinate data pipeline and cleansing services
• Manage data governance
Ref: https://pbs.twimg.com/media/FBlstfBXoAAuJWz.jpg:large

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 16


Full Stack Big Data Processing

Deliver value by analyzing data, communicating the results to help make


business decisions.

• Build and maintain data infrastructure and data


systems, and set up the data warehouse, data pipeline,
and databases.
• Focus on coding, cleaning up data sets, and Use statistical techniques, data analysis methods, and machine
implementing requests that come from data scientists.
learning to analyze data and solve business problems.

https://thanyavuth.medium.com/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 17


Why we need to understand data analytics
manager/director/ staff

Production team
1) Operational level: monitor production performance using technical performance indicators
2) Management level: overview reports/dashboards for decision making

Source: https://www.akvkbi.com/p/bi.html

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 18


Understanding four main types of data analytics

• Reasoning
• Learning

• Optimization
• Rules
• Constrains

• Machine learning
• Forecasting
• Statistical analysis

• Alerts and drill down


• Ad hoc reports
• Standard reports

• Big data platform


• Content management
• RDBMS and integration

Source: https://polymathian.com/news-media/feature-articles/show-me-the-data-and-ill-show-you-the-value/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 19


Layer 1: Descriptive Analytics - What has happened?

See how your processes are doing

From Operational data to Manufacturing service data


Keyword: Monitoring

▪ Performing real-time descriptive/statistical analysis of production monitoring


▪ Performing periodic assessments and producing reports of facility operations data via various statistical analyses and aggregations

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 20


Layer 2: Diagnostic Analytics – Why did it happen?

Analyze what happened in the past

• Alarm Management – Using a cluster of alarms in an alarm roll-up system based on


appearances to find out what alarm likely set off an alarm cascade
• Quality Assurance – Identifying a fault related to a loss in quality and assessing its
point of origin
• Failure Analysis – On failure of a process or machine, examining what could have
happened to cause it

Source: https://www.vertech.com/blog/analytics-on-the-plant-floor

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 21


Layer 2: Diagnostic Analytics – Why did it happen? (contd.)

▪ Alarm Management: the display of all alarms in a single panel

Source: https://www.fabbricadigitale40.it/en/features/alarms-management-for-sme

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 22


Layer 2: Diagnostic Analytics – Why did it happen? (contd.)

▪ Alarm Management: the display of all alarms placed on the machines of a line

Source: https://www.fabbricadigitale40.it/en/features/alarms-management-for-sme

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 23


Layer 2: Diagnostic Analytics – Why did it happen? (contd.)
▪ Human task bottleneck analysis

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 24


Source: https://camunda.com/blog/2021/10/camunda-optimize-360-released/
Layer 3: Predictive Analytics – What will happen next?

What will happen in the future based on the data available now.

• Downtime prevention – Data gives operators a chance to proactively prevent or


minimize downtime rather than simply react and diagnose.
• Output Forecasting – The amount of output that will be generated at current rates
based on known first-principles of processes and current data trends.
• Supply Chain Forecasting – Demand for inputs from the supply chain.
• Revenue Forecasting – How much revenue can be made or lost at current trends.

Source: https://www.vertech.com/blog/analytics-on-the-plant-floor

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 25


Layer 3: Predictive Analytics – What will happen next? (contd.)
▪ Revenue forecasting: online food ordering analysis

Source: https://www.boldbi.com/dashboard-examples/predictive-analytics-dashboard/online-food-ordering-analysis
10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 26
Layer 4: Prescriptive Analytics – What should happen?

Combines the predictive methods with the ability to prescribe a solution that can
prevent a forecasted negative outcome or facilitate a potential positive

• Process Optimization – Simulating your processes to see what can be tweaked to


optimize each process
• Revenue Optimization – Figuring out how to maximize revenue by optimizing
relevant processes
• Supply Chain Optimization – Optimizing the production, scheduling, and inventory
in a supply chain

Source: https://www.vertech.com/blog/analytics-on-the-plant-floor

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 27


Layer 4: Prescriptive Analytics – What should happen?
▪ Prescriptive Maintenance

Source: https://www.youtube.com/watch?time_continue=86&v=iMdY1wWrYeY&feature=emb_title

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 28


Full Stack Big Data Processing

https://thanyavuth.medium.com/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 29


Skills Requires for A Data Scientist

Computer Science
Computer programming, Math & Statistics
Algorithms,
Data structure

Business/ Domain Expertise


https://www.superiordatascience.com/datasciencecasestudy.html

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 30


Artificial intelligence vs. Machine learning vs. Deep learning

INC 362 Machine learning algorithms with


brain-like logical structure of
algorithms called artificial neural
A subset of AI algorithms which learn networks
without being explicitly programmed [Aj. Poj] Special Topic I : Deep Neural
with rules. Use data to learn and
Network & Artificial Intelligent
“Intelligent machines” which can solve match patterns
problems, make/ suggest decisions and
perform tasks that have traditionally require
human to solve
Ex. Logic, if-then rules, and machine learning
https://www.superiordatascience.com/datasciencecasestudy.html
https://electronics360.globalspec.com/article/17406/deep-learning-and-its-applications
10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 31
Artificial intelligence vs. Machine learning vs. Deep learning

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 32


INC492 INTRODUCTION TO DATA SCIENCE
• 3-4 members/group
• Select a Machine Learning application
• Explore How it works?
• Present 3 mins/group

INC492 INTRODUCTION TO DATA SCIENCE


Speech recognition
การฟังเสียงและถอดความ
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
Spotify ได้นาข้อมูลของลูกค้ามาวิเคราะห์ด้วยการเก็บข้อมูลจาก
ประวัติการฟังของลูกค้า ว่าชอบเพลงแนวไหนเป็นพิ เศษ หรือหัวข้อ
พอดแคสต์ประเภทไหนที่ผู้ใช้ฟง
ั เป็นประจา เพื่ อนาไปสร้างคอนเทนต์
ในการนาเสนอให้ถูกใจผู้ใช้งานมากขึ้น

คุณมารุต ชุ่มขุนทด ผู้ร่วมก่อตั้ง Class Cafe

INC492 INTRODUCTION TO DATA SCIENCE


https://stepstraining.co/entrepreneur/7-example-brand-use-data-for-business
Face recognition
การตรวจจับใบหน้า

INC492 INTRODUCTION TO DATA SCIENCE


Face Recognition

https://www.intechopen.com/chapters/52911

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 39


Face recognition
การตรวจจับใบหน้า

INC492 INTRODUCTION TO DATA SCIENCE


Face recognition
การตรวจจับใบหน้า

INC492 INTRODUCTION TO DATA SCIENCE


Computer Vision &
Pattern Recognition

INC492 INTRODUCTION TO DATA SCIENCE


https://www.superiordatascience.com/datasciencecasestudy.html

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 43


https://medium.com/analytics-vidhya/machine-learning-algorithms-exhaustive-list-e69df578c883
Skills Requires for A Data Scientist

Computer Science Math & Statistics


วิทยาการคอมพิวเตอร ์ เช่น การเขียน คณิ ตศาสตร ์ และสถิต ิ
โปรแกรม, อัลกอริธม
ึ , โครงสร ้างข ้อมูล
(Data Structure) เป็ นต ้น

Business/ Domain Expertise


ความรู ้ด ้านธุรกิจ

https://www.superiordatascience.com/datasciencecasestudy.html

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 44


The illustration of relations between data science, machine learning,
artificial intelligence, deep learning, and data mining.

https://www.altexsoft.com/blog/data-science-artificial-intelligence-machine-learning-deep-learning-data-mining/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 45


Cross-industry standard process for data mining

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 46


Human learning
For example, what is the age of the man in the image?

Not very easy, right?


Is he 20, 30 or 40?
We can probably say he is not very old, but it is just hard to be very accurate on the exact age.

Now, which person in the two images is older?

Now based on the wrinkles and silver hair, you can probably quickly
judge that the second man is older.

Source: https://blog.ml.cmu.edu/2019/03/29/building-machine-learning-models-via-comparisons/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 47


Machine learning vs. Human learning

Source: https://blog.ml.cmu.edu/2019/03/29/building-machine-learning-models-via-comparisons/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 48


INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
https://sravya-tech-usage.medium.com/traditional-programming-vs-machine-learning-e9bbed5e491c
Traditional programming

X=5 Computer Y=6

Function X + 1

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 51


Machine learning

X=5
Computer X+1
Y=6

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 52


Traditional programming vs. Machine learning

Traditional programming

X=5 Computer Y=6

Function X + 1

Machine learning

X=5
Computer X+1
Y=6

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 53


Example: insurance company

Imagine a hypothetical insurance company that is striving for the best customer experience in the 21st
digital environment, as well as preserving assets and their ROI.

So, the automatic detection of fraudulent claims is a part of their business processes.

Unfortunately, this is not always the case in real life.


We do not always know exactly what rules a program should follow.
https://www.avenga.com/magazine/machine-learning-programming/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 54


Example: insurance company

https://www.avenga.com/magazine/machine-learning-programming/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 55


Example: insurance company

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 56


Example 1: Customer Churn

• Customer churn is the percentage of customers who stopped purchasing your


business's products or services during a certain period.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 57


Example 1: Customer Churn

If you feed in customer demographics, transactions as input and the observed output
if they churned or not in the past, the algorithm would formulate the program which
would know how to predict if someone would churn or not.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 58


Example 2: Late credit card bill payment

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 59


Example 2: Late credit card bill payment

If you want to predict who will pay the bills late, identify the input (customer
demographics, bills) and the output (pay late or not)

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 60


INC492 INTRODUCTION TO DATA SCIENCE
Go to …
https://teachablemachine.withgoogle.com

INC492 INTRODUCTION TO DATA SCIENCE


Teachable Machine is a web tool that makes it fast and easy to create
machine learning models for your projects, no coding required.

Train a computer to recognize your images, sounds, & poses, then


export your model for your sites, apps, and more.

INC492 INTRODUCTION TO DATA SCIENCE


INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
INC492 INTRODUCTION TO DATA SCIENCE
How does machine learning work?

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 77


How does machine learning work?

https://postindustria.com/how-to-know-which-machine-learning-algorithms-to-use-techniques-in-machine-learning/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 78


Dog vs. Cat

A machine learned function, f, which takes an image as input and returns


if it’s a “cat” or a “dog”

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 79


Dog vs. Cat

f is trained with labelling trained image of cats and dogs.


“dog” is a category and “cat” is another category.
referred to as “classes” in classification problems.

If we feed images of monkey labelled as “cat” ????????


10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 80
INC492 INTRODUCTION TO DATA SCIENCE
Feeding garbage data while training,
leads to garbage predictions making the entire process futile

INC492 INTRODUCTION TO DATA SCIENCE


INC492 INTRODUCTION TO DATA SCIENCE
Model: The representation of what a
machine learning system has learned from
the training data.

INC492 INTRODUCTION TO DATA SCIENCE


Model: The representation of what a
machine learning system has learned from
the training data.

INC492 INTRODUCTION TO DATA SCIENCE


What is the meaning of features?

• In machine learning, features are used as inputs into a machine learning model.
• Features are also sometimes referred to as “variables” or “attributes.”
• For example, building a machine learning model predicting future sales of a store.

https://www.advancinganalytics.co.uk/blog/2022/1/13/a-beginners-guide-to-understanding-feature-stores

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 86


What are feature stores?

• A feature store is a centralized repository that stores curated features.


• It is a data management layer that allows data scientists, machine learning engineers and data engineers to
collaborate, share and discover features.
• It takes raw data and then transforms it into features subsequently used for model training and inference.

https://www.advancinganalytics.co.uk/blog/2022/1/13/a-beginners-guide-to-understanding-feature-stores

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 87


Model training vs. Model inference

• Model training refers to the process of using a machine learning algorithm to build a model.
• Model inference is the process of using a trained model to infer a result from new data.

https://galliot.us/blog/edge-deep-learning-p1/

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 88


How to deploy a model
• Features are used as inputs into the machine learning model to be trained.
• The best model is then deployed into production for inference upon training and validation.

https://www.advancinganalytics.co.uk/blog/2022/1/13/a-beginners-guide-to-understanding-feature-stores

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 89


INC492 INTRODUCTION TO DATA SCIENCE
CLASSIFICATION
Separating into groups having definite values
E.g., 0 or 1, cat or dog or orange etc.

INC492 INTRODUCTION TO DATA SCIENCE


INC492 INTRODUCTION TO DATA SCIENCE
House Price Linear Regression

A predictor line which predicts the Estimate the price of the house size
estimates of housing price. given the size in squared feet.
INC492 INTRODUCTION TO DATA SCIENCE
Historical Sales data
1 Jan 2019 – 31 Mar 2019

How to forecast sales


in Apr 2019

INC492 INTRODUCTION TO DATA SCIENCE


Forecast sales in upcoming month.

Output

INC492 INTRODUCTION TO DATA SCIENCE


INC492 INTRODUCTION TO DATA SCIENCE
REGRESSION:
Estimating the most probable values or relationship among variables.
E.g., estimation of the price of the house based on size.

INC492 INTRODUCTION TO DATA SCIENCE


Types of Machine Learning

INC492 INTRODUCTION TO DATA SCIENCE


3 Types of machine learning

https://www.researchgate.net/figure/The-main-types-of-machine-learning-Main-approaches-include-classification-and_fig1_354960266
10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 99
การเรียนรู้แบบไม่มีผู้สอน
=
ไม่มีคาตอบให้!!

การเรียนรู้แบบมีผู้สอน
=
จับมือสอน!!
INC492 INTRODUCTION TO DATA SCIENCE
Example 1: supervised learning vs. unsupervised learning

Big nose and mouth

Small nose and mouth

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 101


Example 1: supervised learning vs. unsupervised learning (contd.)
X1 = size of nose X2 = size of mouth

Dog
Big nose and mouth
Cat

Label Small nose and mouth

CLASSFICATION CLUSTERING

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 102


Classification vs. Clustering

CLASSFICATION CLUSTERING

https://kevin-c-lee26.medium.com/machine-learning-101-classification-vs-clustering-e11b12c71243

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 103


CLUSTERING:
Clustering involves dividing data points into multiple clusters of similar
values.

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 104


Regression vs. Classification vs. Clustering

https://www.youtube.com/watch?v=rirAaZzjaoA

10/08/2023 INC492 INTRODUCTION TO DATA SCIENCE 105


Department of Control Systems and Instrumentation Engineering
King Mongkut’s University of Technology Thonburi

Dr. Teema Leangarun


teema.lea@kmutt.ac.th

INC492 INTRODUCTION TO DATA SCIENCE

You might also like