You are on page 1of 41

Degree project in Technology

First cycle, 15 HP

Real-time object detection robot


control
Investigating the use of real time object detection on a
Raspberry Pi for robot control

SIMON RYBERG AND JONATHAN JANSSON

1
Real-time object detection robot control
Investigating the use of real time object detection on a
Raspberry Pi for robot control

Simon Ryberg and Jonathan Jansson

Bachelor’s Programme in Mechanical engineering

Date: 2022-05-09
Supervisors: Martin Grimheden
Examiner: Martin Grimheden
School of Industrial Engineering and Management
Swedish title: Autonom robot styrning via realtids bildigenkänning
Swedish subtitle: Undersökning av användningen av realtids bildigenkänning på en
Raspberry Pi för robotstyrning

TRITA-ITM-EX 2022:104 Jonathan Jansson Simon Ryberg

2
© 2022 Simon Ryberg and Jonathan Jansson

3
Abstract

The field of autonomous robots have been explored more and more over the last decade. The
combination of machine learning advances and increases in computational power have created
possibilities to explore the usage of machine learning models on edge devices. The usage of object
detection on edge devices is bottlenecked by the edge devices' limited computational power and they
therefore have constraints when compared to the usage of machine learning models on other devices.
This project explored the possibility to use real time object detection on a Raspberry Pi as input in
different control systems. The Raspberry with the help of a coral USB accelerator was able to find a
specified object and drive to it, and it did so successfully with all the control systems tested. As the
robot was able to navigate to the specified object with all control systems, the possibility of using real
time object detection in faster paced situations can be explored.

Keywords

Edge Device

Raspberry Pi

Image recognition

Machine learning

Tracked robot

Track drive

4
Sammanfattning

Ämnet autonoma robotar har blivit mer och mer undersökt under de senaste årtiondet.
Kombinationen av maskin inlärnings förbättringar och ökade beräknings möjligheter hos datorer och
chip har gjort det möjligt att undersöka användningen av maskin inlärningsmodeller på edge enheter.
Användandet av bildigenkänning på edge enheter är begränsad av edge enheten begränsade
datorkraft, och har därför mer begränsningar i jämförelse med om man använder bildigenkänning på
en annan typ av enhet. Det här projektet har undersökt möjligheten att använda bildigenkänning i
realtid som input för kontrollsystem på en Raspberry Pi. Raspberry Pien med hjälp av en Coral USB
accelerator lyckades att lokalisera och köra till ett specificerat objekt, Raspberryn gjorde detta med
alla kontrollsystem som testades på den. Eftersom roboten lyckades med detta, så öppnas möjligheten
att använda bildigenkänning på edge enheter i snabbare situationer.

Nyckelord

Edge device

Raspberry Pi

Maskin inlärning

Bandvagns robot

Band drivlina

Bildigenkänning

5
Acknowledgments

We would like to thank all supportive classmates for their efforts in helping us with knowledge and
motivation. A special thanks goes to Martin Grimheden and the assistants Tore Malmström and Algot
Lindestam for being supportive and supplying us with good assistance when needed.

Stockholm, May 2022


Simon Ryberg and Jonathan Jansson

6
Table of contents

1 Introduction 11

1.1 Background 11

1.2 Purpose 11

1.3 Scope 11

1.4 Goals 11

1.5 Research Methodology 12

1.5.1 Training object detection model 12

1.5.2 Construction of robot 12

1.5.3 Evaluating different control systems 12

2 Background 13

2.1 Previous research 13

2.2 Hardware 13

2.2.1 Raspberry Pi 13

2.2.2 Motor driver 14

2.2.3 Tensor processing unit 15

2.3 Software 15

2.3.1 OpenCV 15

2.3.2 Tensorflow 15

2.4 Control systems 15

2.4.1 Control system input 16

2.4.2 Bang-bang-controller 16

2.4.3 P-controller 16

2.4.4 PI-controller 16

2.4.5 PID-controller 16

3 Method 17

3.1 Experiment design 17

3.2 Data collection 17

3.3 Planned data analysis 17

3.4 Electronic power plant and motors 18

3.5 Prototyping process 18

3.6 Control system 18

3.6.1 Input to control-systems 18

7
3.6.2 Bang-Bang-controller 19

3.6.3 P-controller 19

3.6.4 PI-controller 19

3.7 Training object detection model 20

3.8 Object detection on the Raspberry Pi 21

4 Results 22

4.1 Construction 22

4.1.1 First prototype 22

4.1.2 Final robot 23

4.2 Results of different control systems 25

4.3.1 Bang-bang 25

4.3.2 P-controller 26

4.3.3 PI-controller 26

4.3.5 Evaluation of the image detection model 28

5 Discussion 29

5.1 Different control-systems 29

5.2 Hardwares impact on the robot 29

5.3 Different disturbances for the object detection 30

5.4 The trained model 30

6 Conclusion 31

6.1 Conclusions 31

6.2 Limitations 31

6.3 Future work and improvements 31

6.3.1 Control-systems 31

6.3.2 Practical purpose 31

6.3.3 Limit of real time object detection on edge devices 32

References 33

Appendix 34

Code: 34

8
List of Figures

Figure 1: GPIO pins on a Raspberry Pi 4 13


Figure 2: PWM signals 14
Figure 3: HAT-MDD10 14
Figure 4: Testing track 17
Figure 5: EfficientDet model 21
Figure 6: First prototype 22
Figure 7: CAD-model of the robot 23
Figure 8: CAD-model of the robot 23
Figure 9: CAD-model of the robot 24
Figure 10: The robot in complete stage 24
Figure 11: Bang-bang-controller results 25
Figure 12: P-controller results 26
Figure 13: PI-controller results 26
Figure 14: PID-controller results 27
Figure 15: PID-controller results 27
Figure 16: Detection of object in different conditions 28

List of Tables

Table 1: Image detection model node indexes 21

9
List of acronyms and abbreviations

CV Computer vision
DC Direct current
FPS Frames per second
GPIO General purpose input/output
ITM Industrial engineering and management
PWM Pulse with modulation
TF Tensorflow
TFlite Tensorflow lite
TPU Tensor processing Unit
USB Universal serial bus

10
1 Introduction

1.1 Background
Image recognition is becoming more widely used and is used today for a plethora of different
applications, some recent uses of image recognition includes the use of image recognition in
autonomous vehicles. An interesting sub-genera of machine learning applications is those on edge
devices. Edge devices are devices that have limited computational power and can therefore not be
used in all applications, some examples of edge devices are: Arduino, Raspberry Pi and mobile
devices. Examples of image recognition on edge devices are apps that can classify objects, for example
mushrooms and tell which is edible and which are not. An interesting application of image recognition
is that of using it as input to a control system to create an autonomous small robot which runs on an
edge device, this is what this study aims to explore.

1.2 Purpose
To evaluate the feasibility to use real time object recognition on an edge device such as the Raspberry
Pi 4 and use the information to control a tracked robot to a specified object. To evaluate the feasibility,
two research questions were created to investigate the feasibility. The research questions are:

1. Is it possible to control and guide a tracked robot to an object using a real time image
detection system with an edge device such as a Raspberry Pi?

2. If research question one is possible, can the drive path be optimized using control feedback
systems such as P-, PI- and PID-Controllers?

1.3 Scope
The time frame for this project is limited to one complete semester of about 4 months. This allows for
development of a proper functioning prototype as well as research and development of an image
detection control system as well as an image detection model. Parts for the prototype are sourced from
old scrapped projects or bought if needed. The robot's main purpose is to evaluate the feasibility of
using real time object detection on an edge device for robot control. The robots ability to do further
tasks other than driving towards a specified object will not be explored in this project. The project is
limited computationally as all computations will be done on an edge device which is the Raspberry Pi
4, with help from a Coral USB accelerator. Only one way of controlling a robot with real time object
detection will be explored.

1.4 Goals
The main goal of this project is to construct a robot that, with the input from an object detection
model on a Raspberry Pi 4, can drive towards objects and answer the research questions mentioned in
1.2 Purpose. The robot is supposed to be a solid testbed which allows multiple designs of control
systems with object detection as input to be evaluated. This gives the opportunity to properly design,
test and evaluate the possibilities and the limitations with different control systems with object
detection as input.

11
1.5 Research Methodology

By constructing a robot that can be controlled by an edge device, the feasibility of real time object
detection usage in control systems can be tested and observed, and thus answering the research
questions. By collecting images of a given object, the object detection algorithm can be trained to
detect the specified object. The robot in conjunction with the image detection algorithm is then used
to test different control systems to evaluate the feasibility to control a robot with real time object
detection.

1.5.1 Training object detection model

The object detection model was trained by collecting a dataset of 288 images of the object which the
robot was supposed to locate. The images were later labeled to create XML. files containing class
number, and were in the picture the object was located. The labeling of the pictures were done with
the program labelimg[1]. The images and XML. files were then separated into three different
categories, one for training, one for testing and one for validating the image detection algorithm. The
object detection model used was EfficienDet-lite0 and trained using tensorflow.

1.5.2 Construction of robot

The construction of the robot began with a rough prototype to be able to do quick testing of the
concept. Different tests such as motor-tests, motor driver-tests and driveability tests were done to
fully understand how a tracked vehicle satisfying our needs was to be designed. A further developed
robot was then constructed with the help of knowledge gained from the first prototype. This in order
to construct a final robot that fully satisfies our needs and will be able to perform accurate testing with
high durability.

1.5.3 Evaluating different control systems

In order to evaluate the control systems, multiple different control systems were tested and compared
in a clean lab with consistent conditions. A navigation test was set up where the robot is precisely
placed with a fixed distance and a fixed angle to the object. The same test was run multiple times with
different control systems to test how well the image recognition system works together with the
control system. Conditions as the light and background of the object were tweaked in order to evaluate
the accuracy and consistency of the image recognition system.

12
2 Background

2.1 Previous research

Different types of autonomous robots have previously been built, where one example is from the
research article “Color Based Object Recognition, Localization, and Movement Control for Multiple
Wheeled Mobile Robots” by Gen'ichi Yasuda, and Bin Ge[2]. Inspirations were taken from the article of
how centroids of the objects can be used to describe an object's location within a picture. The article
used color to locate the specified object, this differing from the general purpose image detection model
used in this project. The logic of describing an object's location within a picture is not specific to a
specific type of object recognition and can therefore be used.

The article “Efficientnet-Lite and Hybrid CNN-KNN Implementation for Facial Expression
Recognition on Raspberry Pi” by Mohd Nadhir Ab Wahab, Mohd Halim Mohd Noor, Muhammad
Firdaus Akbar[3] were an inperation for the project. The implementation of certain image detection
models in their research gave inspiration of which image detection models were feasible to use on a
Raspberry Pi. Their research also showed that the usage of Tensorflow lite instead of Tensorflow
increased inference speed by 51 times with the downside being a 0,17 % loss in accuracy. The usage of
Tensorflow lite was therefore chosen as inference speed, the speed of which the tensors for an image
detection model can be calculated is an important factor in the project.

2.2 Hardware

2.2.1 Raspberry Pi

Raspberry Pi is a foundation based in the UK that produces low-cost, high-performance single-board


computers[4]. The type of Raspberry Pi used in this project is the Raspberry Pi 4(Raspberry Pi 4) with
8 GB of ram. The Raspberry Pi 4 is equipped with 40 GPIO (General Purpose Input/Output) pins, in
figure 1 the pins and their usage are shown.

Figure 1. GPIO pins on a Raspberry Pi 4, Raspberry Pi Foundation [23]

The Raspberry Pi has some pins that have hardware PWM (pulse with modulation) availability and all
pins have software available PWM[5]. PWM or pulse with modulation is a way of getting an analog
output with digital means. A digital control switches on and off to create square waves that simulate
voltage values between the highest voltage output from a pin and 0 volts[6]. In figure 2 PWM signals
are illustrated.

13
Figure 2. PWM signals, T. Hirzel, Basics of PWM (Pulse Width Modulation) [24]

Electrical components can be wired to the pins on the Raspberry Pi to control them, and to give inputs
to the Raspberry Pi. As well as connecting individual components to the pins, driver cards can be
connected to several of the pins and allow motors or other electrical components with a higher voltage
need to be connected, as the power is given from a separate source, and not the Raspberry Pi.

2.2.2 Motor driver

A motor driver enables motors to be driven via a microcontroller or a computer, and is needed as the
microcontroller or computer does not operate at the same current level as the motor[7]. A motor driver
takes low current signals from the device sending it inputs and then gives high current output signals
to the motors connected to it. The motor driver HAT-MDD10 shown in figure 3 has an array for
connecting 40 GPIO pins, which allows for mounting directly on a Raspberry Pi 4, this is the driver
used in this project. The HAT-MDD10 can handle currents ranging between 6 and 24 volts and allows
for PWM to be used in the control of the motors.

Figure 3. HAT-MDD10, Elektrokit [25]

14
2.2.3 Tensor processing unit

The Coral USB accelerator is a tensor processing unit (TPU) that can be connected to a computer such
as the Raspberry Pi to accelerate inference[8], meaning the speed of which the system calculates the
tensors from the image detection model. Accelerated inference, speeds up the image refresh rate and
refresh rate of the control system.

2.3 Software

2.3.1 OpenCV

OpenCV (open computer vision) is an open source computer vision library, which is used for
computer vision applications, the library has over 2500 optimized algorithms for computer vision and
machine learning algorithms[9]. OpenCV has modules which allow a program to access a device's
webcam or cameras connected to the device. OpenCV can connect to a Raspberry Pi camera module
and give a program frames from its videofeed that can be used in the rest of the program. OpenCV also
has modules to resize images and other image manipulation modules[10].

2.3.2 Tensorflow

Tensorflow is an ecosystem which provides workflows to train and deploy machine learning models[11].
Tensorflow has a special library for edge devices called “Tensorflow Lite”. Tensorflow allows for
training models in the normal version of tensorflow and then converting them to tflite(tensorflow lite)
models. Tflite models have several advantages over normal tensorflow models, tflite models are
smaller and have faster inference compared to their tensorflow counterpart, which allows tflite files to
run better on devices with limited memory and computational power[12]. Tflite models can be further
optimized by quantizing the weights in the model. Quantizing the weights makes them faster to
compute and more memory effective[12], an example is that instead of calculating weights that are 32
bit float values, instead the weights are 16 bit float values or 8 bit integer values. Optimizing the
weights by quantizing them is also a prerequisite to do inference on a TPU, where the model has to
have weights that are 8 bit integer values[12].

2.4 Control systems


To control the outputs to the motors a control system is to be used. Different types of control systems
give different benefits and complexity in the creation of them. The different types of controllers to be
used are bang-bang-controller, P-controller, PI-controller and a PID-Controller

15
2.4.1 Control system input

The input value to the control system was the value of where in the webcam footage the specified
object was located. The reference value of all the control systems were set to a value of 0,5 on the
x-axis. The value of 0,5 was chosen as it was in the middle of the screen, as 0 was the left most value
and 1 was the right most value. If the control system were to steer to the reference value the robot
would drive straight towards the given object.

2.4.2 Bang-bang-controller

A bang-bang controller works by reacting to an output in one way. This means that if the current value
is below the target value, 100% of possible actions will be made until the value is over the target value.
Then 100% of possible actions will be made in the opposite direction until the value is under the target
value again. This will lead to a very rough path to the target value.

2.4.3 P-controller

A P-controller is a proportional controller which reacts proportional to its input. A comparison that
can be made is that with an example with a robot driving on a 2D surface. A bang-bang-controller
reacts by turning left at a constant speed, whereas a P-controller turns left proportional to how far the
robot is right and has to turn left.

2.4.4 PI-controller

A common problem with the P-controller is that the difference between the set value in the control
system and the current value never reaches its reference value; this results in a steady-state error[13].
To compensate for the fact that the current value never reaches the set value, the PI-controller can be
implemented. This adds an integral part to the equation which constantly checks how big the error is,
and depending on the size of the error, an integral constant is added to the compensating equation.
This constantly bumps the next input value closer to the target value[14].

2.4.5 PID-controller

The steering output of the PI-controller could lead to overcompensation of the error with the integral
correction of the error. This could lead to overshoot and oscillations in the control output values[14]. To
compensate for these phenomenons. A derivative part is added to the equation and it becomes a
PID-controller. The derivative part is used to make the navigation sequence more stable. The
derivative part considers the rate of change at the present stage and presumes what future actions will
be and compensates based on its predictions. This allows for bigger Proportional and Integral gains
while still managing to keep the control loop stable[15]. The complete equation with all the parts can be
seen in equation 1:

Equation 1

16
3 Method

3.1 Experiment design

In terms of testing the object detection systems as well as the different control systems, A
navigation test was set up in the lab. The test was formulated as follows, a red can was placed
as far to the left in the camera's detection field as possible. The can was then moved away
from the camera until the recognition system could barely identify the can. As seen in
figure 4, a mark was made on the floor for the cans final position, and a startbox for the robot
were also marked to the floor in order to ensure consistency and similarity between the tests.
The test starts by running the image recognition code in combination with the motor
controller code, and the test stops when the robot touches the red can.

Figure 4. Testing track

3.2 Data collection

The data from each individual run is saved in a separate document after each run. This is
implemented in the code and collects data in real time with associated timestamps for each
event in the test. The primary data collected is the current value in the control loop which is
the input coordinates from the camera along with the timestamps. Each test is done at least 3
times to ensure that single time occurrences or flukes in the testing can be eliminated. In the
event of an error when starting the robot, the timestamp data will be corrected manually in
Matlab.

3.3 Planned data analysis

The datafiles created by the script after a run is stored on a USB-stick. The data is then imported to
matlab. The data is then run in a script in Matlab and tables with graphs are plotted automatically.
The graphs from all the runs are then saved to be further analyzed and compared.

17
3.4 Electronic power plant and motors

Because the vehicle is belt driven, much of the motor power is wasted on friction losses between the
belts and the wheels, but also between the belts and the ground when turning[16]. There are no
requirements of high speed of the vehicle. Therefore, electrical motors with high torque and low RPM
are required. The motors used were sourced from old salvaged projects from previous years
Mechatronic bachelor thesis. The motors are 12V DC motors with integrated gearboxes with a final
shaft speed of 55,6 RPM.

The power supply on the vehicle is a 12 volt battery from a moped. This is because of the high potential
power output combined with large battery capacity. Since the Raspberry Pi 4 runs on 5 volts, a 5 volt
supply is also needed. To avoid voltage step down hardware, a regular powerbank to a mobile phone
was used. The powerbank supplies the microcomputer with a peak current of 2,4 Ampere, which is
enough to run the computer at its full potential. During testing of the motors as well as the electronic
control systems, a lab based power supply was used to allow easy testing without dependence on
charging the battery.

3.5 Prototyping process

The prototyping process consisted of building a prototype to evaluate the construction and its
shortcomings and then constructing a second robot. The iterative design process purpose was to
reduce the number of shortcomings in the final robot. Examples of shortcomings that were of high
importance were fit of the belt and precision of the axles, this to allow better tension of the belt and
reduce vibrations in the construction when running.

3.6 Control system

The controls for the robot were manually tuned, and were optimized by testing.

3.6.1 Input to control-systems

The main principle for all control systems is that the program determines where in the picture the
target object is, then calculates the mean x-coordinates for the object. The picture taken from the
raspberry camera has a resolution of 640x480. This is rescaled in the program to a x-axis in the
picture that goes from 0 to 1, left to right in the picture. The center value is therefore 0,5. This is the
target value for the object's position.

18
3.6.2 Bang-Bang-controller

This is the first and simplest control system programmed. The control system is based on the
coordinates in which the object is located. If the object is located in a small span close to the middle
value at 0,5. The output will be 100% on both motors. If the object is located to the left or to the right
of the object, the inner motor will be shut off for a set period of time and therefore making a turn
towards the location. This system is very simple and does not require PWM control over the motors to
function, however the accuracy and movement towards the object is poor.

3.6.3 P-controller

The control system is based on a linear feedback model where the action taken is linearly dependent
on how large the error is. If the actual value measured is far from the desired value, a big change is
made. If the actual value is closer to the desired value, a smaller action is taken. This is achieved with
help of PWM control over the motors allowing us to steplessly control the motor speed and thereby
the turning radius.

This program is built around two different scenarios.

If the object appears at a x-coordinate below 0.5, this means that the robot has to make a left turn in
order to change direction towards the object. This is achieved by driving the right motor at 100% and
the left motor at a slower speed in order to turn. The speed of the left side motor is calculated with a
linear dependence on how far to the left the object appears in the coordinate system. For example if
the object is at the far edge in the picture, the left motor will have a very low speed close to 0% making
a sharp turn with a small turning radius. The closer the object appears to the centerline of the image
the faster the speed of the left motor will be, thus making a bigger turning radius.

If the object appears at a x-coordinate above 0.5, the robot takes the same action described in the
previous scenario but the motor outputs are switched in order to steer right instead of left.

3.6.4 PI-controller

The PI-controller is a regular P-controller with an added integral part which has the purpose of
minimizing the error after each data update. The way this was implemented was by storing the error
from the previous data point. If the error is large, a big constant will be added to the next control loop
input. This results in minimizing the error simultaneously after each datapoint. The integrating part
has a high tendency to cause overshoot and oscillations.

3.6.5 PID-controller

The PID-controller is based on the previous PI-controller, the difference is an added differential part
to the correcting equations. This is implemented by storing previous corrections in the code and
predicting the future corrections based on the previous corrections. For example, if the change of the
error is decreasing, the differential part will act to decrease the rate of change, counteracting the
I-parts tendency to overshoot.

19
3.7 Training object detection model

The object detection model that was trained was an EfficientDet-lite0, this model is a result of the
research of Mingxing Tan, Ruoming Pang, Quoc V. Le and is documented in the article “EfficientDet:
Scalable and Efficient Object Detection''[18], the focus of the article and the model is efficiency, a factor
that is critical when using a model on an edge device. The model was trained using tensorflow on a
dataset of 288 images of the object that was to be detected, which in the case of this project was a Coca
Cola Zero 33cl can. To train an object detection algorithm files containing class and location of the
object is needed. In the case of the EfficientDet model the files were to be given in PascalVOC format.
The files were created by using Labelimg, a program for classifying where in a picture objects are
located, the program then creates XML. files containing that information. The files and images were
separated into three categories. The three categories were training,testing and validation. The training
subset consisted of 246 images and their associated XML. files. The testing subset and validating
subset consisted of 42 images and their associated XML. files, the testing and validation subsets
consisted of the same 42 images, to give the training subset a larger piece of the total amount of
pictures.

The model was trained using a Google colab notebook, a cloud solution from Google allowing python
code to be runned on via the cloud utilizing Google's computers and GPUs[19]. Google has created a
tutorial in creating EfficientDet models using Google colab[20], which was utilized to train our model.
Aswell as training a EfficientDet model the Google colab notebook also converts the model to a tflite
format and quantized it to int8(8 bit integer) weights and compiles it to a model which can be used on
a Raspberry Pi with tensorflow lite and with an Coral USB accelerator TPU connected.

The pictures that the model was trained on were all taken in the lab with different backgrounds and
with different distances to the camera.

20
3.8 Object detection on the Raspberry Pi

To be able to do object detection on the Raspberry Pi a object detection model and a python program
that can process pictures with a model as help has to be present. The program which does inference on
the Raspberry Pi is a modified version of Evan Juras’s program “TFLite_detection_webcam.py”[21].
Juras’s program is based upon a type of object detection model with the name of SSD MobileNet V2.
The MobileNet and the EfficientDet moldels do have the same output nodes but in different orders.
The nodes are responsible for the bounding box-,classes-,number- and score values, as the program is
based upon these outputs the program cannot use an EfficientDet model without modification. Using
Netron[22] the object detection models could be analyzed and their corresponding output nodes could
be found to modify Juras’s program. The output of both of the models result in four output nodes as
shown in figure 5. In table 1 the output nodes and their corresponding outputs are shown for both of
the models. The program “TFLite_detection_webcam.py '' was modified to react to the outputs of an
EfficientDet model as well as functions for writing values and timestamps to files for data collection,
control system implementation and GPIO pin output commands were implemented.

Figure 5. EfficientDet model

Class Index Number Index Box Index Score Index

SSD MoblieNet
V2
2 3 0 1
EfficientDet lite0
3 2 1 0
Table 1. Image detection model indexes

21
4 Results

4.1 Construction

4.1.1 First prototype

Because one of the main challenges of the project is the image recognition system and motor control, a
solid platform is required to start testing all the electrical components as early as possible. The design
strived for is a tracked vehicle with 2 belts, one on each side with one motor for each belt. To avoid
waiting for an order for a belt, the first prototype is built after an existing engine timing belt from a
car. The chassi was quickly put together from scrap metal and wooden wheels which were laser cut
from sheet plywood. This allowed for quick testing of the driving motors and it was quickly found that
the motors used at the time were too weak to successfully drive the belt with the sought for accuracy.
The problems occured because of too much friction in the belts and poor accuracy in the
manufacturing methods. This first prototype seen in figure 6 was therefore scrapped.

Figure 6. First prototype

22
4.1.2 Final robot

New smaller belts were acquired and a new prototype was made based on the new belts. This time the
manufacturing method is only laser cutting from acrylic plastic. Mainly because the high accuracy in
laser cutting allows for better belt alignment and stability. All the parts were designed and assembled
in CAD-software (Solid-Edge) to ensure perfect fitment and efficient manufacturing of the parts. To
reduce the high friction, ball bearings are used in all of the non driving wheels.

Figure 7. CAD-model of the robot

Figure 8. CAD-model of the robot

23
Figure 9. CAD-model of the robot

The robot was constructed mainly by laser cut acrylic panels and the main part glued together, the
resulting structure was rigid and with the side panels that were mounted using M8 bolts the structural
build of the robot was rigid and tough. For increased stability and to prevent flexing under high load,
secondary sides are mounted with 5 M8 bolts on each side. This also helps with the tensioning of the
belt. The belt type used is a v-belt from a car with a total length of 735mm[17]. The laser cut acrylic
parts had a high precision which in conjunction with the rigid body, handled the belt drive without
any vibrations while driving. The belt drive made it possible for the robot to be able to drive over small
ledges and other small osticles. The robot can be seen in figure 10.

Figure 10. The robot in complete stage

24
4.2 Results of different control systems
The different control systems were evaluated by running the robot on a determined track in the lab,
with the outlines of the test described in 3.1 experiment design. During the tests of the controllers, the
program wrote down timestamps and the value of where in the webcam the object detection system
saw the object. Matlab was used to construct graphs out of the datasheets created by the robot, which
are shown below in the figures. The control systems tested and shown below are bang-bang,P-,PI- and
PID controllers.

The Y-axis of the graphs represent the area in the picture the robot sees the specified object, where the
value of 0,5 represents that the object is in the middle of the picture that the robot sees. The value of 1
on the Y-axis represents that the object is as far to the right in the picture as possible and 0 as far to
the left in the picture as possible. The X-axis represents the elapsed time in seconds. The graph
represents how fast the robot can reach a straight trajectory towards the object and also how its
trajectory looks like. In some of the results shown below, the value on the Y-axis oscillates. The
oscillations on the Y-axis correspond with oscillations in the robot's path, as when the robot
overshoots its target the object will be shown in the other half of the picture. The Graphs therefore
also represents the robots path towards the object.

4.3.1 Bang-bang

Figure 11. Bang-bang-controller results

25
4.3.2 P-controller

Figure 12. P-controller results

4.3.3 PI-controller

Figure 13. PI-controller results

26
4.3.4 PID-controller

Figure 14. PID-controller results

Figure 15. PID-controller results

27
4.3.5 Evaluation of the image detection model

All the controllers tested, successfully navigated to the location of the object in the tests. In some of
the tests the object detection algorithm did not locate the object straight away, and had to do several
inferences before it started to locate the object. This was compensated for when the figures above were
created. Tests of different conditions for the object detection were done, where there was a difference
between the accuracy depending on the disturbances and conditions in the picture. The different
scenarios that were tested were with and without a white background, and with low and high lighting
conditions. The results of the test are shown in figure 16.

Figure 16. Detection of object in different conditions

During the tests the object detection program reached a mean FPS of about 16,3. The object detection
model was able to distinguish objects from a range of about 2 meters, this however varied a lot
depending on background and lighting conditions. where a nosier background and lower lighting
conditions lead to a shorter range that the object detection program could distinguish it from.

28
5 Discussion

5.1 Different control-systems

Among all the different control systems tested, some control systems turned out to be more effective
and easier to set up than others. Bang-bang-controller for example was the least efficient control
system tested. This because it is in the process of changing direction 100% of the time. Since turning is
done by reducing the motor speed of the inner track, in this case from 100% motorspeed to 0%, the
speed forward is therefore dramatically decreased. The results of this run can be seen in figure 11.

The P-controller was also very easy to set up and performed surprisingly well. The robot made a rapid
but soft turn towards the object and went in a straight line most of the path to the object. This means
that both motors are going at 100% most of the time after the rapid turn in the beginning leading to a
very efficient path to the object. As seen in figure 12 the path did not overshoot anything, however a
steady state error occurred. This was no problem at all for the robot because of the large width of the
robot compared to the object. And the closer the robot is coming to the can, the bigger the can gets in
the picture. This results in a very small steady state error in the end which in this case can be
completely ignored.

The PI-controller was implemented to get even faster to the target line in the graph in figure 12.
However, the larger the Ki-value (Integral-constant) gets, the more overshoot is occuring. This is
because when the Ki-value gets bigger, the bigger the compensation for the error gets, resulting in
overcompensation as seen in figure 13. This is leading to further oscillation towards the target object
making the drive towards the object slower because both motors are not constantly going at 100%.

To Compensate for the overshoot, the PID-controller was implemented. This was first tested with a
constant Ki-value as seen in figure 14. As seen, different values of the Kd-value with a constant
Ki-value did not improve the oscillations. In figure 15 is the Ki-value as well as the Kd-value tweaked.
This resulted in managing to almost completely reduce overshoot. However, some small tendencies to
oscillations still remained.

5.2 Hardwares impact on the robot


The motors that were used had a max RPM of 55,6. This meant that the top speed of the robot was not
that high. The combination of the speed of the robot and a high refresh rate of about 16,3 fps might
have contributed to that all of the tested control systems could navigate to the object successfully.
With an increased RPM on the motors the outcomes of which control systems could navigate to the
object successfully would have changed and the system would have become more unstable and would
have reacted more to changes in the control systems parameters. The fact that all control systems
could navigate to the object gives the robot room to go faster as even with manually tuned controllers
with very different parameters none were not successful in their navigation task to the object.

29
5.3 Different disturbances for the object detection
The accuracy of the object detection system was tested in four different scenarios. Normal light with
white background, increased light with white background, normal light with noisy background,
increased light with noisy background. The object detection system had the highest confidence score
in high light conditions and with a white background, and the second highest with a noisy background
and high light conditions. High light conditions therefore seem as the most important factor of the two
when it comes to confidence scores. The reason for this may be that the colors of the can are more
vibrant and the object detection therefore has an easier time distinguishing the object. The difference
also showed in the range that the object detection model could detect objects from, where a white
background and higher lighting conditions made it possible for the object detection program to be
able to locate objects further away.

5.4 The trained model


The model was trained using 288 images and worked quite well, it had some shortfalls when it came
to lighting conditions and background noise which led to a decreased effective range for the object
detection program. The model was trained using photos taken in the lab during mid day(high lighting
conditions). The pictures which the models were trained with does have a huge effect on how well the
model can recognize objects. The possibility of biased training might be a contributor to the effects
seen when testing the object detection program.

30
6 Conclusion

6.1 Conclusions

The robot was able to find and drive to the object in all the tests with different controllers, even with
controllers with high oscillations the robot was able to successfully navigate towards the object. This
answers if a robot can be controlled with real time object detection, and it can. The results also show
that the robot could increase its operating speed and still be able to navigate towards the object.

In terms of control-systems, the prefered control system in this application is the PID-controller. The
gain of the PID controller compared to the P-controller was better, and the PID-controller’s
oscillations had an amplitude that were almost negligible. The PID-controller optimized the gain and
when compared to the bang-bang controller it performed leagues better.

6.2 Limitations

Time was the major limitation to this project. With a limited time to complete testing, construction
and do background studies, different aspects of the robot and its capabilities were left unsolved. One
such example would be the implementation of a system for picking up the object that the robot drives
to. By setting the scope of the project the project became limited to the research questions that were
decided and left room to do further exploration.

6.3 Future work and improvements

6.3.1 Control-systems

Since the PID-controller in this project was manually tuned, optimized performance of the robot was
never reached. By scientifically optimizing the Kp-,Ki-, and Kd-values in the PID-function, the robot's
path and efficiency to the object can be fully optimized.

6.3.2 Practical purpose

Since the vehicle's main purpose is to find objects of different kinds. Some sort of robot arm with the
ability to pick up things such as trash for example, would give the project more of a practical purpose.
Since trash for example can occur in a wide variation of environments such as on the street, in the
woods or indoors, the robot's drivetrain opens up many possibilities in terms of driving in different
rough environments. If the chassi is closed, the electronic components can be securely protected
inside the waterproof chassi, allowing for wet driving conditions.

31
6.3.3 Limit of real time object detection on edge devices

The robot constructed in this project was successful in its navigation to the specified object with all of
the controllers tested on it, this opens up the possibility of speeding up the robot, and a testing where
the limit for the robot is. The edge device which runs the program is limited in its computational
power and therefore there must be a limit to what is possible to achieve.

32
References

[1]:, tzutalin, Labelimg, https://github.com/tzutalin/labelImg, version: binary v1.8.1, 2022-05-08


[2]: Gen'ichi Yasuda, and Bin Ge, Color Based Object Recognition, Localization, and Movement
Control for Multiple Wheeled Mobile Robots ,
https://ieeexplore.ieee.org/abstract/document/1433343, 2022-05-08
[3]:Mohd Nadhir Ab Wahab, Mohd Halim Mohd Noor, Muhammad Firdaus Akbar, Efficientnet-Lite
and Hybrid CNN-KNN Implementation for Facial Expression Recognition on Raspberry Pi,
https://ieeexplore.ieee.org/abstract/document/9540698, 2022-05-08
[4]: Raspberry Pi Foundation, https://www.raspberrypi.org/about/, 2022-05-08
[5]: Raspberry Pi Foundation, https://www.raspberrypi.com/documentation/computers/os.html,
2022-05-08
[6]: T. Hirzel, Basics of PWM (Pulse Width Modulation),
https://docs.arduino.cc/learn/microcontrollers/analog-output, 2022-05-08
[7]: Solectroshop, Choosing the Right Type of Motor Driver for Your Project,
https://solectroshop.com/en/blog/choosing-the-right-type-of-motor-driver-for-your-project-n14,
2022-05-08
[8]: Google LLC, Get started with the USB Accelerator,
https://coral.ai/docs/accelerator/get-started/#requirements, 2022-05-08
[9]: OpenCV team, https://opencv.org/about/, 2022-05-08
[10]: OpenCV team, https://docs.opencv.org/4.x/, 2022-05-08
[11]: Google LLC, Introduction to TensorFlow, https://www.tensorflow.org/learn, 2022-05-08
[12]: Google LLC, TensorFlow Lite, https://www.tensorflow.org/lite/guide#2_convert_the_model,
2022-05-08
[13]: Control Station , what-is-proportional-only-control-when-should-p-only-control-be-used,
https://controlstation.com/blog/what-is-proportional-only-control-when-should-p-only-control-be-u
sed/, 2022-05-08
[14]: P. Yotov, ProportionalIntegralControl,
https://apmonitor.com/pdc/index.php/Main/ProportionalIntegralControl, 2022-05-08
[15]: Control solutions Minnesota, PIDforDummies,
https://www.csimn.com/CSI_pages/PIDforDummies.html, 2022-05-08
[16]: D.W. Baker, W.Haynes, flexible-belt-friction,
https://engineeringstatics.org/Chapter_09-flexible-belt-friction.html, 2022-05-08
[17]:
https://www.biltema.se/en-se/car---mc/car-spares/engine-parts/fan-belts/fan-belt-95x735mm-200
0026025, 2022-05-08
[18]: T.Mingxing, Scalable and Efficient Object Detection,
https://arxiv.org/abs/1911.09070, 2022-05-08
[19]: Colaboratory, https://colab.research.google.com/#scrollTo=5fCEDCU_qrC0, 2022-05-08
[20]: Colaboratory, Retrain EfficientDet for the Edge TPU with TensorFlow Lite Model Maker,
https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_efficientdet_
model_maker_tf2.ipynb#scrollTo=Gb7qyhNL1yWt, 2022-05-08
[21]: EjdeElectronics, TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi,
https://github.com/EdjeElectronics/TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-
Pi, 2022-05-08
[22]: L.Roeder, https://netron.app/, 2022-05-08
[23]: Raspberry Pi Foundation, https://www.raspberrypi.com/documentation/computers/os.html,
2022-05-08
[24]: T. Hirzel, Basics of PWM (Pulse Width Modulation),
https://docs.arduino.cc/learn/microcontrollers/analog-output, 2022-05-08
[25]: Electrokit ,
https://www.electrokit.com/produkt/dubbel-motordrivare-for-raspberry-pi-6-24v-10a/, 2022-05-08

33
Appendix

Code:

# Python program modified by: Simon Ryberg and Jonathan Jansson


# Used in bachelor thesis at KTH
# Date: 05/09/2022
# The program has been modified to run EfficientDet models
# Functions for using GPIO pins have been added
# A PID controller has been implemented and written from scratch
# Code for saving timestamps and camera values have been added
# Classes for use in PID control have been added
# The program is based upon Evan Juras program "TFlite_detection_webcam.py"

######## Webcam Object Detection Using Tensorflow-trained Classifier #########


#
# Author: Evan Juras
# Date: 10/27/19
#
# This code is based off the TensorFlow Lite image classification example at:
#
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_i
mage.py
#
# I added my own method of drawing boxes and labels using OpenCV.

# Import packages
import os
import argparse
import cv2
import numpy as np
import sys
import time
from threading import Thread
import importlib.util

import Raspberry Pi.GPIO as GPIO # using Raspberry Pi.GPIO module


from time import sleep # import function sleep for delay
GPIO.setmode(GPIO.BCM) # GPIO numbering
GPIO.setwarnings(False) # enable warning from GPIO
AN2 = 13 # set pwm2 pin on MD10-Hat
AN1 = 12 # set pwm1 pin on MD10-hat
DIG2 = 24 # set dir2 pin on MD10-Hat
DIG1 = 26 # set dir1 pin on MD10-Hat
GPIO.setup(AN2, GPIO.OUT) # set pin as output

34
GPIO.setup(AN1, GPIO.OUT) # set pin as output
GPIO.setup(DIG2, GPIO.OUT) # set pin as output
GPIO.setup(DIG1, GPIO.OUT) # set pin as output
sleep(1) # delay for 1 seconds
p1 = GPIO.PWM(AN1, 100) # set pwm for M1
p2 = GPIO.PWM(AN2, 100) # set pwm for M2

class referenceValues():
def __init__(self,ref=0.5):
self.ref = ref

class memory:
def __init__(self):
self.sum = 0
self.preVal = 0

def addToIntegral(self,add):
self.sum = self.sum + add

def controler(inputen):
refVarde = referenceValues()
diff = (float(refVarde.ref) - float(inputen))
integral.addToIntegral(diff)
iValue = integral.sum
dValue = integral.preVal - diff

Kp = 200
Ki = 5
Kd = 20
if inputen == 0.5:
h= 100
v = 100
GPIO.output(DIG1, GPIO.LOW) # set DIG1 as LOW, to control direction
GPIO.output(DIG2, GPIO.LOW) # set DIG2 as LOW, to control direction
p1.start(h) # set speed for M1 at 100%
p2.start(v)
#print('framåt')

elif inputen > 0.5:


h = 100 - Kp*(abs(diff)) - Ki*iValue + Kd*dValue
h = clamp(h,0,100)
#h = 0
v = 100
GPIO.output(DIG1, GPIO.LOW) # set DIG1 as LOW, to control direction
GPIO.output(DIG2, GPIO.LOW) # set DIG2 as LOW, to control direction
p1.start(h) # set speed for M1 at 100%
p2.start(v)
#print('vänster')

35
elif inputen < 0.5:
v = 100 - Kp*(abs(diff)) - Ki*iValue + Kd*dValue
v = clamp(v,0,100)
#v = 0
h = 100
GPIO.output(DIG1, GPIO.LOW) # set DIG1 as LOW, to control direction
GPIO.output(DIG2, GPIO.LOW) # set DIG2 as LOW, to control direction
p1.start(h) # set speed for M1 at 100%
p2.start(v)
#print('höger')

cameravärde = str(inputen)
sluttid = time.time()
tidsnotering = str(sluttid-starttid)
f = open("datasetPI5D20.txt","a")
f.write(cameravärde + ",")
f.write(tidsnotering + "\n")
f.close()

def clamp(n, minn, maxn):


if n < minn:
return minn
elif n > maxn:
return maxn
else:
return n

def motorstyrning(classes,xOrt):
if classes == 'Cola Burk':
controler(xOrt)

# Define VideoStream class to handle streaming of video from webcam in separate processing
thread
# Source - Adrian Rosebrock, PyImageSearch:
https://www.pyimagesearch.com/2015/12/28/increasing-raspberry-pi-fps-with-python-and-openc
v/
class VideoStream:
"""Camera object that controls video streaming from the Picamera"""
def __init__(self,resolution=(640,480),framerate=30):
# Initialize the PiCamera and the camera image stream
self.stream = cv2.VideoCapture(0)
ret = self.stream.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG'))
ret = self.stream.set(3,resolution[0])
ret = self.stream.set(4,resolution[1])

# Read first frame from the stream


(self.grabbed, self.frame) = self.stream.read()

36
# Variable to control when the camera is stopped
self.stopped = False

def start(self):
# Start the thread that reads frames from the video stream
Thread(target=self.update,args=()).start()
return self

def update(self):
# Keep looping indefinitely until the thread is stopped
while True:
# If the camera is stopped, stop the thread
if self.stopped:
# Close camera resources
self.stream.release()
return

# Otherwise, grab the next frame from the stream


(self.grabbed, self.frame) = self.stream.read()

def read(self):
# Return the most recent frame
return self.frame

def stop(self):
# Indicate that the camera and thread should be stopped
self.stopped = True

# Define and parse input arguments

MODEL_NAME = "KEX"
GRAPH_NAME = "edgetpu.tflite"
LABELMAP_NAME = "labelmap.txt"
min_conf_threshold = 0.5
resW, resH = 640,480
imW, imH = int(resW), int(resH)
use_TPU = True

starttid = time.time() #definierar start tiden för dataförigen.

# Import TensorFlow libraries


# If tflite_runtime is installed, import interpreter from tflite_runtime, else import from regular
tensorflow
# If using Coral Edge TPU, import the load_delegate library
pkg = importlib.util.find_spec('tflite_runtime')
if pkg:
from tflite_runtime.interpreter import Interpreter
if use_TPU:
from tflite_runtime.interpreter import load_delegate
else:

37
from tensorflow.lite.python.interpreter import Interpreter
if use_TPU:
from tensorflow.lite.python.interpreter import load_delegate

# If using Edge TPU, assign filename for Edge TPU model


if use_TPU:
# If user has specified the name of the .tflite file, use that name, otherwise use default
'edgetpu.tflite'
if (GRAPH_NAME == 'detect.tflite'):
GRAPH_NAME = 'edgetpu.tflite'

# Get path to current working directory


CWD_PATH = os.getcwd()

# Path to .tflite file, which contains the model that is used for object detection
PATH_TO_CKPT = os.path.join(CWD_PATH,MODEL_NAME,GRAPH_NAME)

# Path to label map file


PATH_TO_LABELS = os.path.join(CWD_PATH,MODEL_NAME,LABELMAP_NAME)

# Load the label map


with open(PATH_TO_LABELS, 'r') as f:
labels = [line.strip() for line in f.readlines()]

# Load the Tensorflow Lite model.


# If using Edge TPU, use special load_delegate argument
if use_TPU:
interpreter = Interpreter(model_path=PATH_TO_CKPT,
experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
print(PATH_TO_CKPT)
else:
interpreter = Interpreter(model_path=PATH_TO_CKPT)

interpreter.allocate_tensors()

# Get model details


input_details = interpreter.get_input_details()
#print(input_details)
output_details = interpreter.get_output_details()
#print(output_details)
height = input_details[0]['shape'][1]
width = input_details[0]['shape'][2]

floating_model = (input_details[0]['dtype'] == np.float32)

input_mean = 127.5
input_std = 127.5

# Initialize frame rate calculation

38
frame_rate_calc = 1
freq = cv2.getTickFrequency()

# Initialize video stream


videostream = VideoStream(resolution=(imW,imH),framerate=30).start()
time.sleep(1)

#Create Integral for Contoller

integral = memory()

#for frame1 in camera.capture_continuous(rawCapture, format="bgr",use_video_port=True):


while True:

# Start timer (for calculating frame rate)


t1 = cv2.getTickCount()

# Grab frame from video stream


frame1 = videostream.read()

# Acquire frame and resize to expected shape [1xHxWx3]


frame = frame1.copy()
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)

# Normalize pixel values if using a floating model (i.e. if model is non-quantized)


if floating_model:
input_data = (np.float32(input_data) - input_mean) / input_std

# Perform the actual detection by running the model with the image as input
interpreter.set_tensor(input_details[0]['index'],input_data)
interpreter.invoke()

# Retrieve detection results


boxes = interpreter.get_tensor(output_details[1]['index'])[0] # Bounding box coordinates of
detected objects
scores = interpreter.get_tensor(output_details[0]['index'])[0] # Confidence of detected objects
classes = interpreter.get_tensor(output_details[3]['index']) # Class index of detected objects

#num = interpreter.get_tensor(output_details[3]['index'])[0] # Total number of detected objects


(inaccurate and not needed)
#print(boxes)

#print(scores)
# Loop over all detections and draw detection box if confidence is above minimum threshold
for i in range(len(scores)):
if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):

# Get bounding box coordinates and draw box

39
# Interpreter can return coordinates that are outside of image dimensions, need to force
them to be within image using max() and min()
ymin = int(max(1,(boxes[i][0] * imH)))
xmin = int(max(1,(boxes[i][1] * imW)))
ymax = int(min(imH,(boxes[i][2] * imH)))
xmax = int(min(imW,(boxes[i][3] * imW)))

xMean = (xmax+xmin)/2
xOrt = xMean/imW

cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)

# Draw label
object_name = 'Cola Burk' # Look up object name from "labels" array using class index
motorstyrning(object_name,xOrt)
label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font
size
label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of
window
cv2.rectangle(frame, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0],
label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
cv2.putText(frame, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0),
2) # Draw label text

# Draw framerate in corner of frame


cv2.putText(frame,'FPS:
{0:.2f}'.format(frame_rate_calc),(30,50),cv2.FONT_HERSHEY_SIMPLEX,1,(255,255,0),2,cv2.LINE_A
A)

# All the results have been drawn on the frame, so it's time to display it.
cv2.imshow('Object detector', frame)

# Calculate framerate
t2 = cv2.getTickCount()
time1 = (t2-t1)/freq
frame_rate_calc= 1/time1

# Press 'q' to quit


if cv2.waitKey(1) == ord('q'):
break

# Clean up
cv2.destroyAllWindows()
videostream.stop()

#Stop motors

40
GPIO.output(DIG1, GPIO.LOW) # set DIG1 as LOW, to control direction
GPIO.output(DIG2, GPIO.LOW) # set DIG2 as LOW, to control direction
p1.start(0) # set speed for M1 at 100%
p2.start(0)

41

You might also like