You are on page 1of 29

IoT & Machine Learning: From Data Collection to

Model Implementation

Dan Murphy
Tuesday, June 16, 2020
What We’ll Cover

1. Data Collection : Best Practices, Strategies, and What to Look out for

2. Machine Learning Models : Support Vector Machine, k-Nearest


Neighbors, One-Hot Encoding, Metrics, and more!

3. Deploying Your Model : Streamlit vs. Flask, Model Maintainability,


Accessibility and Design

Presentation Overview
1. Part 1 - Data Collection : Motivation for the project, how I collected the
data

2. Part 2 - Data Cleaning : Importing the data, re-structuring it, and


preparing it for machine learning

3. Part 3 - Building your Model(s) : Selecting your model, Sci-kit Learn


implementations, evaluation metrics

4. Part 4 - Deploying your Model : Deployment strategy, upkeep


Part One : Data Collection
Motivation
1. Challenge of learning IoT and working with sensor data

2. Build my first ‘start-to-finish’ Machine Learning project


How I Collected the Data
1. Connect sensors to ESP8266 Board (Random Nerd Tutorials)
How I Collected the Data
1. Connect sensors to ESP8266 Board (Random Nerd Tutorials)

2. Integrate with IFTTT


How I Collected the Data
1. Connect sensors to ESP8266 Board (Random Nerd Tutorials)

2. Integrate with IFTTT

3. Connect board and sensors to outlet / power source


Part Two : Data Cleaning
Data Cleaning
1. Change Column Names to be Representative of the data Stored in
that Column
Data Cleaning
BEFORE AFTER
Data Cleaning
1. Change Column Names to be Representative of the data Stored in
that Column

2. Fix the “Value3” Column

• Should I Remove or Replace


Outliers / Anomalies / ‘Bad’ Data

?
Data Cleaning
1. Change Column Names to be Representative of the data Stored in
that Column

2. Fix the “Value3” Column

• Should I Remove or Replace


Outliers / Anomalies / ‘Bad’ Data

3. Normalization Techniques

• Min-Max vs. Standardization


Data Cleaning
1. Change Column Names to be Representative of the data Stored in
that Column

2. Fix the “Value3” Column

• Should I Remove or Replace


Outliers / Anomalies / ‘Bad’ Data

3. Normalization Techniques

• Min-Max vs. Standardization

4. One-Hot-Encode the “Date” Column


Data Cleaning
Before After
Additional Tips
1. df.columns.str.replace(‘ ‘, ‘_’)
Additional Tips
1. df.columns.str.replace(‘ ‘, ‘_’)

2. list comprehensions
Additional Tips
1. df.columns.str.replace(‘ ‘, ‘_’)

2. list comprehensions

3. Keep a copy of your raw data frame


Part Three : Building your Model(s)
Selecting Your Model
1. Start with your assumptions, build from there

2. Don’t over-complicate things

3. Build up an intuition
Sci-kit Learn
1. Support Vector Machine
Sci-kit Learn
1. Support Vector Machine

2. k-Nearest Neighbors
Evaluation Metrics
1. Confusion Matrix
Evaluation Metrics
1. Confusion Matrix

2. Test Accuracy vs. Algorithm Parameter(s)


Evaluation Metrics
1. Confusion Matrix (TP rate, FP rate)

2. Test Accuracy vs. Algorithm Parameter(s)

3. ROC Curve
Part Four : Deploying Your Model
Deployment Strategy
1. Streamlit vs. Flask

• Streamlit: Limited Control of UI, Great for internal tool-building

• Flask: Template rendering, Routing, Great for consumer-facing products

2. Accessible Design

• Color contrast, alt text, font type and weight


Model Upkeep
• Model Drift
1. Concept Drift : Properties of dependent variable change (target)

2. Data Drift : Properties of independent variable change (feature)

3. Upstream Drift : Operational changes in the data pipeline

• Class Imbalance
Contact Information

E-mail : danielmurphy8@gmail.com

LinkedIn : Profile

Link to Project : Heroku App

Repository : GitHub

You might also like