You are on page 1of 39

Implementing an Edge Computing

Apache Kafka Inference Engine

Douglas Eadline

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Presenter

Douglas Eadline
deadline@basement-supercomputing.com
@thedeadline

• HPC/Hadoop Consultant/Writer
• https://www.basement-supercomputing.com
(Changing to https://www.limulus-computing.com)

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Outline
• Segment 1: Introduction and Course Goals
• Segment 2: Problem Description
• Segment 3: Using Apache Kafka
• Break (10 mins)
• Segment 4: Using Keras (TensorFlow)
• Segment 5: Integrating Components
• Segment 6: Testing the Application
• Segment 7: Course Wrap-up, Questions, and
Additional Resources

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 1

Introduction and
Course Goals

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Recommended Approach To Class
• Course covers a lot of material!
• Designed to get you started (“hello.c” approach)
• Sit back and watch the examples
• All examples are provided in a notes file
• I will refer to file throughout the class
(cut and paste)
• The notes files are available for download along
with some help on installing software from class
web page.

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Courses In Scalable Data Pipeline

1. Apache Hadoop, Spark, and Kafka Foundations (3 hours-1 day)


2. Beginning Linux Command Line for Data Engineers and Analysts
(3 hours-1 day)
3. Intermediate Linux Command Line for Data Engineers and
Analysts (3 hours-1 day)
4. Hands-on Introduction to Apache Hadoop, Spark, and Kafka
Programming (6 hours-2 days)
5. Data Engineering at Scale with Apache Hadoop and Spark (3
hours-1 day)
6. Scalable Analytics with Apache Hadoop, Spark, and Kafka (3
hours-1 day)

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Course Webpage

https://www.clustermonkey.net/scalable-analytics

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Questions ?

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 2

Problem Description

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What is Edge Computing?
How to manage the data gap?

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What is Edge Computing?
• Local Computing vs. Non-Data Center/Cloud
– Latency (very fast response)
– Traffic (data movement is expensive)
– Low noise, limited power, and low heat are important
– Need fast inference, data reduction and munging, compute
• Where are Edge Systems
– Lab
– Office
– Factory
– Home
– Isolated Locations

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Local Example: FDM 3D Printing
Fused Deposition Modeling

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Printer (Lulzbot Taz 6)

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What Can you Print?
Prototype, Production, Unique Parts
Various Materials, No Minimum

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Our Example
• Lulzbot “Rocktopus”
• High speed print
– 61 visible layers
(High detail uses
129 layers)
• Height ~ 30mm
• Width ~ 80 mm
• Takes about 45 min
to print

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What Can go Wrong?
• Bed de-adhesion
– Results in “Spaghetti”
– Results in misaligned print

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What Can Go Wrong ?

• Object on build plate


• Can jam print head
• Move though collision
with print head

• Also, bad first layer


creates bad print

© Copyright 2020, Basement Supercomputing, All rights Reserved.


How to “Keep and Eye” on Print

1. Place a camera and take periodic pictures of


printing
2. Collect data in real time
3. Use a trained model to detected failure
4. Stop printing process and send alert if potential
failure is detected

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Printing Videos

Video Examples
– Good Print
– Failed Print
– Object on Print Bed

We will work with the snapshots used to make the video

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Our Resource List

• 3D Printer – Lulzbot Taz 6 (Cura for Slicing)


• Raspberry Pi – Used as Printer controller (USB to Printer,
wireless to local LAN)
• Octaprint Software – Raspberry Pi based software to
control printer and record pictures pictures (uses Octolapse
for clean images and video – snaps image at completion of
each layer, no print head in picture)
• Kafka Cluster – Collects (buffers/brokers) images coming
from printers
• Keras-Tensor Flow – Keras-TF used to build model and
used for “inference”

© Copyright 2020, Basement Supercomputing, All rights Reserved.


General Data Flow
Images go to Kafka via Producer, then pulled out by
Consumer.

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Questions ?

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 3

Using Apache Kafka

© Copyright 2020, Basement Supercomputing, All rights Reserved.


What is Kafka?
• Kafka is used for real-time streams of data, to collect
big data, or to do real time analysis (or both) “a
message broker”
• Kafka is a high throughput, scalable, and reliable
(replication) service that records all kinds of data in
real-time.
• Data can include: logs, web apps, messages,
manufacturing, database, weather, financial streams,
and anything else.
• Kafka collects data from a variety of sources (streams)
and time-lines so that it can be easily and consistently
available for processing

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Kafka Components
• A Kafka Producer is an application that can act as
a source of data in a Kafka cluster. A producer can
publish messages to one or more Kafka Topics.
• A Kafka Consumer It is a client or a program,
which consumes the published messages from the
Producer.
• A Kafka Topic is a category/feed name to which
messages are stored and published. Producer
applications write data to topics and consumer
applications read from topics.

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Kafka Brokers
• Sitting between Producer and Consumer
applications is a Kafka Broker
• Kafka topics are divided into a number
of replicated Partitions. Partitions allow you to
parallelize (both writing and reading) a topic by
splitting the data across multiple servers (each
running a Kafka topic Broker. Partitions can be
run in a distributed fashion on separate servers
allowing for fast input (producers) and output
(consumers). In addition partitions are replicated
for redundancy.

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Kafka Cluster

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Why Kafka?
3D Printers can print multiple (different) objects at the same time.
• Sometimes many same objects (using “mirrored” print heads)
• Sometimes all custom objects
• Ideal to watch each object (e.g. nine image streams)

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Why Kafka?
Multiple printers may each have their unique number
of streams
e.g. Four printers may be sending nine separate image streams
resulting in 36 total streams

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Questions ?

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Break

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 4

Using Keras (TensorFlow)


to Build a Model

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Keras Implementations
Two ways to use Keras

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 5

Integrating Components

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Edge Inference
Complete Local Data Flow for the Printing Process

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 6

Testing the Application


?????

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Segment 7

More Resources
Course Wrap-up

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Edge Takeaways

• A data center or cloud may not be the best solution


when latency, data volume, movement, and
environment are important
• Data collection and Inference happen on the Edge
• Multiple data streams can get messy, use Kafka as
an organized “data sponge”
• Keras is a easy way to build and train model
• All Data Science projects are iterative and take
patience and practice

© Copyright 2020, Basement Supercomputing, All rights Reserved.


Thank You from the Rocktopus Family

Questions ?

© Copyright 2020, Basement Supercomputing, All rights Reserved.

You might also like