You are on page 1of 349

Elmar Wings

Artificial Intelligence with the Jetson Nano


Real-Time Object Recognition with Pi Camera

Version: 1765
October 24, 2023
Contents

Contents 3

List of Figures 11

I. Machine Learning 17
1. Artificial intelligence - When are machines intelligent? 19
1.1. What is Artificial Intelligence? . . . . . . . . . . . . . . . . . . . . . . 19
1.2. Machinelles Lernen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3. Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5. Limits of Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . 22

2. Processes for Knowledge Acquisition in Databases 25


2.1. Knowledge Discovery in Databases (KDD Process) . . . . . . . . . . . 25
2.1.1. Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.2. Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3. KDD-Prozess based on Fayyad . . . . . . . . . . . . . . . . . . 25
2.2. Generic Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1. Application Domain Understanding . . . . . . . . . . . . . . . 28
2.2.2. Understanding the Data (Data Understanding) . . . . . . . . . 28
2.2.3. Data Preparation and Identification of DM Technology . . . . 28
2.2.4. Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.5. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.6. Knowledge Deployment (Knowledge Consolidation and Deploy-
ment) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3. CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4. Application-Oriented Process Models . . . . . . . . . . . . . . . . . . . 29
2.5. Role of the User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6. KDD Process using the Example of Image Classification with an Edge
Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.1. Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.2. Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.3. Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.4. Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.5. Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.6. Evaluation/Verification . . . . . . . . . . . . . . . . . . . . . . 32
2.6.7. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7. Data Analysis and Data Mining Methods at a Glance . . . . . . . . . 32
2.7.1. Definition Data Analysis . . . . . . . . . . . . . . . . . . . . . . 32
2.7.2. Statistical Data Analysis . . . . . . . . . . . . . . . . . . . . . 32
2.7.3. Overview of Data Mining Methods and Procedures . . . . . . . 34

3. Neural Networks 37
3.1. Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1. Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3
4

3.1.2. Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.3. Activation and Output Function . . . . . . . . . . . . . . . . . 39
3.2. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1. Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2. Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3. Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4. Error Backpropagation . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.5. Problem of Overfitting . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.6. Epochs, Batches and Steps . . . . . . . . . . . . . . . . . . . . 42
3.2.7. Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.8. Correct Classification Rate and Loss Function . . . . . . . . . . 43
3.2.9. Training Data, Validation Data and Test Set . . . . . . . . . . 43
3.2.10. Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.11. Weight Imprinting . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3. Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1. Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2. Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3. Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.4. Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.5. Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.6. Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.7. AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.8. YOLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4. Images and Image Classification with a Convolutional Neural Network
(CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1. Representation of Images . . . . . . . . . . . . . . . . . . . . . 50
3.5. Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.1. CNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2. LeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.3. AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.4. InceptionNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.5. VGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.6. Residual Neural Network (ResNet) . . . . . . . . . . . . . . . . 54
3.5.7. MobileNet & MobileNetV2 . . . . . . . . . . . . . . . . . . . . 54
3.5.8. EfficientNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.9. General Structure . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.10. Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.11. Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.12. Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.13. Activation function . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.14. Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5.15. Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.16. Dense Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.17. Model used for a CIFAR 10 classification . . . . . . . . . . . . 60

4. Databases and Models for Machine Learning 63


4.1. Datenbanken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.1. Data Set MNIST . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.2. CIFAR-10 and CIFAR-100 . . . . . . . . . . . . . . . . . . . . 65
4.1.3. Fisher’s Iris Data Set . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.4. Common Objects in Context - COCO . . . . . . . . . . . . . . 70
4.1.5. ImageNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.6. Visual Wake Words . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2. FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3. Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Contents 5

5. Frameworks and Libraries 77


5.1. Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.1. Jupyter-Notebook . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2. Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1. TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.2. TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.3. TensorRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.4. Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.5. PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.6. Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.7. Caffe2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.8. Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.9. Apache MXNet . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.10. CNTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.11. Deeplearning4J . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.12. Chainer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.13. FastAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3. General libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.1. NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.2. SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.3. pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.4. IPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.5. Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.6. scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3.7. Scrapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.8. NLTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.9. Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.10. Seaborn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.11. Bokeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.12. basemap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.13. NetworkX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.14. LightGBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.15. Eli5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.16. Mlpy - Machine Learning . . . . . . . . . . . . . . . . . . . . . 82
5.3.17. Statsmodels - Statistical Data Analysis . . . . . . . . . . . . . 82
5.4. Specialised Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1. OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.2. OpenVINO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.3. Compute Unified Device Architecture (CUDA) . . . . . . . . . 82
5.4.4. OpenNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5. Special Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.1. glob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.2. os . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.3. PIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.4. LCC15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.5. math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.6. cv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.7. random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5.8. pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5.9. PyPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6. Libraries, Modules, and Frameworks 85


6.1. TensorFlow Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.1. TensorFlow Use Case . . . . . . . . . . . . . . . . . . . . . . . . 85
6

6.2. OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1. Application of OpenCV . . . . . . . . . . . . . . . . . . . . . . 86
6.2.2. Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.3. How does a computer read an image . . . . . . . . . . . . . . . 86
6.3. MediaPipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4. MediaPipe Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1. Hand Landmarks Detection . . . . . . . . . . . . . . . . . . . . 87
6.4.2. Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

II. Jetson Nano 89


7. Jetson Nano 91
7.1. The Jetson Nano and the Competition . . . . . . . . . . . . . . . . . . 91
7.2. Applications of the Jetson Nano . . . . . . . . . . . . . . . . . . . . . . 91

8. Application “Image Recognition with the Jetson” 93


8.1. Application Example “Bottle Detection” . . . . . . . . . . . . . . . . . 93
8.2. Necessary Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

9. NVIDIA Jetson Nano Developer Kit 95


9.1. NVIDIA Jetson Nano Developer Kit . . . . . . . . . . . . . . . . . . . 95
9.2. Construction of the Jetson Nano . . . . . . . . . . . . . . . . . . . . . 95
9.2.1. Technical data . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.3. ToDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.4. Interfaces of the Jetson Nano . . . . . . . . . . . . . . . . . . . . . . . 98
9.4.1. J41 Expansion Header Pinout - GPIO Pins . . . . . . . . . . . 98
9.4.2. Serial interface J44 UART . . . . . . . . . . . . . . . . . . . . . 99
9.4.3. Interface J40 Buttons . . . . . . . . . . . . . . . . . . . . . . . 99
9.4.4. I2 C-Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.4.5. I2 S-Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.4.6. Serial Peripheral Interface (SPI) . . . . . . . . . . . . . . . . . 99
9.4.7. Power over Ethernet . . . . . . . . . . . . . . . . . . . . . . . . 101
9.4.8. Interface J15 to the fan . . . . . . . . . . . . . . . . . . . . . . 101
9.5. Structure of the Jetson Nano . . . . . . . . . . . . . . . . . . . . . . . 101
9.6. JetPack operating system for the Jetson Nano . . . . . . . . . . . . . . 101
9.6.1. Bescheibung des Betriebssystems . . . . . . . . . . . . . . . . . 101
9.6.2. Installing the operating system . . . . . . . . . . . . . . . . . . 101
9.6.3. Setup with graphical user interface and first start . . . . . . . . 104
9.6.4. Setup without graphic interface . . . . . . . . . . . . . . . . . . 106
9.7. Remote access to the Jetson Nano . . . . . . . . . . . . . . . . . . . . 106
9.7.1. Switching on and activating Secure Shell . . . . . . . . . . . . . 107
9.7.2. Remote Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.8. Software installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.8.1. Installing pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.8.2. Installing cmake and git . . . . . . . . . . . . . . . . . . . . . . 109
9.9. Installing the General Purpose Input Output (GPIO)-Library . . . . . 109
9.10. Raspberry Pi Camera Module V2 . . . . . . . . . . . . . . . . . . . . . 110
9.10.1. Camera resolution . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.10.2. Camera cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.10.3. CSI-3-Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.10.4. Starting up the Raspberry Pi Camera with the Jetson Nano . . 111

10.Camera Calibration using OpenCV 113


10.1. Calibration using Checkerboard . . . . . . . . . . . . . . . . . . . . . . 113
Contents 7

10.2. Method to Calibrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113


10.2.1. Evaluation of Results . . . . . . . . . . . . . . . . . . . . . . . 114

11.Distortion and Camera Calibration 117


11.1. Why Distortion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
11.1.1. Types of Distortion . . . . . . . . . . . . . . . . . . . . . . . . . 117
11.2. Calibration using Checkerboard . . . . . . . . . . . . . . . . . . . . . . 118
11.3. Method to Calibrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
11.3.1. Step1: Finding the corners of the chessboard . . . . . . . . . . 119
11.3.2. Step2: Calibrating the camera . . . . . . . . . . . . . . . . . . 121
11.3.3. Step3: Undistort images . . . . . . . . . . . . . . . . . . . . . . 122
11.3.4. Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

12.Arducam IMX477 127


12.1. What are the Jetson Native Cameras . . . . . . . . . . . . . . . . . . . 128
12.2. What is the Advantage on Jetson Native Cameras . . . . . . . . . . . 128

13.Lens Calibration Tool 131


13.1. Übersicht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.2. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.3. Package Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

14.Jetson Nano MSR-Lab 133


14.1. GPIOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
14.1.1. Control via the terminal . . . . . . . . . . . . . . . . . . . . . . 134
14.1.2. Verwendung der GPIO-Pins im Python-Programm . . . . . . . 134
14.1.3. Visualisierung mit LED . . . . . . . . . . . . . . . . . . . . . . 135

15.“Hello World” - Image Recognition 137


15.1. Loading the Example Project from GitHub . . . . . . . . . . . . . . . 137
15.2. Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
15.2.1. Launch of the First Machine Learning Model . . . . . . . . . . 138
15.2.2. Using the Console Programme imagenet-console.py . . . . . 138
15.2.3. Live Camera Image Recognition with imagenet-camera.py . . 138
15.3. Creating your Own Programme . . . . . . . . . . . . . . . . . . . . . . 139

III. Object Detection with the Jetson Nano 141


16.Python Installation 143

17.Virtual environments in Python 145


17.1. package python3-venv for virtual environments . . . . . . . . . . . . 145
17.2. Package pipenv for virtual environments . . . . . . . . . . . . . . . . 145
17.3. Program Anaconda Prompt-It for virtual environments . . . . . . . . 146
17.4. Program Anaconda Navigator for Virtual Environments . . . . . . . 147

18.TensorFlow 151
18.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
18.1.1. TensorFlow Framework . . . . . . . . . . . . . . . . . . . . . . 151
18.1.2. TensorFlow Use Case . . . . . . . . . . . . . . . . . . . . . . . . 151
18.1.3. TensorFlow Lite for Microcontrollers . . . . . . . . . . . . . . . 151
18.2. Sample TensorFlow Code . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.3. TensorFlow 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.4. TensorFlow and Python . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8

18.5. Installation of TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . 152


18.5.1. Standard Installation . . . . . . . . . . . . . . . . . . . . . . . . 154
18.5.2. Installation of Intel Hardware . . . . . . . . . . . . . . . . . . . 154
18.5.3. Installation of Jetson Nano Hardware . . . . . . . . . . . . . . 154
18.6. “Hello World 1” of Machine Learning - Handwritten Digits . . . . . . . 155
18.6.1. Loading the Data Set MNIST . . . . . . . . . . . . . . . . . . . 155
18.6.2. Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 155
18.6.3. Structure and Training of the Model . . . . . . . . . . . . . . . 157
18.6.4. History of the Training and Visualisation with TensorBoard . . 158
18.6.5. Problems with TensorBoard . . . . . . . . . . . . . . . . . . . . 160
18.6.6. Neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
18.6.7. Convolutional Neural Network . . . . . . . . . . . . . . . . . . 161
18.6.8. Complete code . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
18.7. “Hello World 2” - Detection of an Iris . . . . . . . . . . . . . . . . . . . 162
18.7.1. Loading Fisher’s Iris Data Set . . . . . . . . . . . . . . . . . . . 162
18.7.2. Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 166
18.7.3. Structure and Training of the Model . . . . . . . . . . . . . . . 168
18.7.4. History of the Training and Visualisation with TensorBoard . . 169
18.7.5. Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
18.7.6. Complete Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
18.8.„Hello World 3“- Bottle Detection . . . . . . . . . . . . . . . . . . . . . 174
18.8.1. Dataset CIFAR-10 and CIFAR-100 . . . . . . . . . . . . . . . . 174
18.8.2. Loading the Dataset . . . . . . . . . . . . . . . . . . . . . . . . 174
18.8.3. Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 176
18.8.4. Structure and Training of the Model . . . . . . . . . . . . . . . 184
18.8.5. History of the Training and Visualisation with TensorBoard . . 184
18.9. Sichern und Wiederherstellen eines Modells im Format HDF5 . . . . . 188
18.9.1. Notwendige Bibiotheken . . . . . . . . . . . . . . . . . . . . . . 190
18.9.2. Speichern während des Trainings . . . . . . . . . . . . . . . . . 190
18.9.3. Speichern von Gewichten . . . . . . . . . . . . . . . . . . . . . 191
18.9.4. Speichern des gesamten Modells . . . . . . . . . . . . . . . . . 191
18.10.Nutzung eines Modells . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
18.11.Optimierung der Trainingszeit . . . . . . . . . . . . . . . . . . . . . . . 192
18.11.1.Engpass identifizieren . . . . . . . . . . . . . . . . . . . . . . . 192
18.11.2.tf.data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
18.11.3.Mixed Precision Training . . . . . . . . . . . . . . . . . . . . . 194
18.11.4.Multi-GPU-Training . . . . . . . . . . . . . . . . . . . . . . . . 194
18.12.Optimierung Multithreading . . . . . . . . . . . . . . . . . . . . . . . . 194

19.TensorFlow Lite 197


19.1. TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
19.2. What is TensorFlow Lite? . . . . . . . . . . . . . . . . . . . . . . . . . 197
19.3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
19.4. TensorFlow Lite Converter . . . . . . . . . . . . . . . . . . . . . . . . . 198
19.5. TensorFlow Lite Interpreter . . . . . . . . . . . . . . . . . . . . . . . . 199
19.6. TensorFlow Lite and Jetson Nano . . . . . . . . . . . . . . . . . . . . . 199
19.6.1. Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
19.6.2. Conversion of the model . . . . . . . . . . . . . . . . . . . . . . 199
19.6.3. Transfer of the model . . . . . . . . . . . . . . . . . . . . . . . 200
19.6.4. Using a model on Jetson Nano . . . . . . . . . . . . . . . . . . 200
19.7. Application example: tflite model for bottle detection with a combined
data set from CIFAR10 and CIFAR100 . . . . . . . . . . . . . . . . . 201
19.7.1. Conversion to format tflite . . . . . . . . . . . . . . . . . . . . 201
19.7.2. Test with any images . . . . . . . . . . . . . . . . . . . . . . . . 201
19.7.3. Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Contents 9

19.7.4. Evaluation of the Test Results . . . . . . . . . . . . . . . . . . 211


19.7.5. Conclusion and Potential for Improvement . . . . . . . . . . . . 211
19.7.6. Bottle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 212

IV. Appendix 215


20.Material List 217

21.Data Sheets 219


21.1. Data Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
21.2. Data Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
21.3. Jetson Nano Developer Kit - User Guide . . . . . . . . . . . . . . . . . 259
21.4. Jetson Nano Developer Kit 40-Pin Expansion Header Configuration -
Application Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
21.5. Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage
Considerations - Application Note . . . . . . . . . . . . . . . . . . . . 294

22.RASP CAM HQ Raspberry Pi - Kamera, 12MP, Sony IMX477R 307

23.RPIZ CAM 16mm 311

24.RPIZ CAM 6mm WW 313

25.Data Sheets RaspCAM 315


25.1. Data Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
25.2. Data Sheet Raspberry Pi High Quality Camera . . . . . . . . . . . . . 317
25.3. Data sheet Raspberry Pi High Quality Camera - Getting started . . . 323
25.4. Data sheet Raspberry Pi High Quality Camera - component drawing . 330
25.5. Data sheet Raspberry Pi High Quality Camera - Parameter 6 mm lens 331
25.6. Data sheet Raspberry Pi High Quality Camera - Mounting the lenses . 333
25.7. Hama Mini-Stativ „Ball“, XL, Schwarz/Silber . . . . . . . . . . . . . . 335

Literature 337

Index 347
List of Figures

1.1. relationship of artificial intelligence, machine learning and deep learning 20


1.2. Example of a Künstliches Neuronales Netz (KNN) . . . . . . . . . . . 21
1.3. Overlay of the image of a panda with noise [GSS14] . . . . . . . . . . 22
1.4. Manipulation of traffic signs [Sit+18] . . . . . . . . . . . . . . . . . . . 22
1.5. Optimal image for the identification of a toaster [Sit+18] . . . . . . . 23

2.1. KDD process according to [FPSS96] (own reduced representation based


on [FPSS96]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2. Life cycle of a data mining project in CRISP-DM according to [Cha+00] 29
2.3. Modified model of the KDD process based on the model according to
[FPSS96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4. KDD process in the industrial environment . . . . . . . . . . . . . . . 30

3.1. Biological (left) and abstracted (right) model of neural networks. Source:[Ert16] 37
3.2. Representation of a neuron as a simple threshold element . . . . . . . 38
3.3. Graph with weighted edges and equivalent weight matrix . . . . . . . 38
3.4. ReLu function (left) and Leaky ReLu . . . . . . . . . . . . . . . . . . . 40
3.5. Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6. Funktion Tangens Hyperbolicus . . . . . . . . . . . . . . . . . . . . . . 41
3.7. Illustration of the weight imprinting process [QBL18] . . . . . . . . . . 44
3.8. Illustration of imprinting in the normalized embedding space. (a) Before
imprinting, the decision boundaries are determined by the trained
weights. (b) With imprinting, the embedding of an example (the yellow
point) from a novel class defines a new region. [QBL18] . . . . . . . . 44
3.9. The YOLO system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.10. Original image, scaling, rotation and cropping of the image . . . . . . 49
3.11. Structure of a black and white image with 10 × 10 pixels . . . . . . . . 50
3.12. Image with 3 colour channels . . . . . . . . . . . . . . . . . . . . . . . 50
3.13. Greyscale with a value range from 0 to 255 . . . . . . . . . . . . . . . 51
3.14. Black and white image, image with 4 grey levels and with 256 grey levels. 51
3.15. Softened image with a 5 × 5 filter . . . . . . . . . . . . . . . . . . . . . 52
3.16. Edge detection on an image . . . . . . . . . . . . . . . . . . . . . . . . 52
3.17. Struktur der CNN-Architektur gemäß LeCun et.al. [LeC+98] . . . . . 53
3.18. Definition of the convolution of two matrices . . . . . . . . . . . . . . 56
3.19. Convolution layer with 3 × 3 kernel and stride (1, 1) . . . . . . . . . . 56
3.20. Beispielanwendung für einen Filter und ein Ausschnitt . . . . . . . . . 57
3.21. Visualisation of filters of a trained network [LXW19; Sha+16] . . . . . 58
3.22. Detection of vertical and horizontal edges . . . . . . . . . . . . . . . . 58
3.23. Representation of the ReLu activation function . . . . . . . . . . . . . 59
3.24. Application of an activation function with a bias b ∈ R to a matrix . . 59
3.25. Application of max-pooling and average-pooling . . . . . . . . . . . . . 60
3.26. Application of flattening . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.27. Structure of the neural network used . . . . . . . . . . . . . . . . . . . 61

4.1. Examples from the training set of the dataset MNIST [SSS19] . . . . . 65
4.2. Extract from ten illustrations of the data set Canadian Institute For
Advanced Research (CIFAR)-10 . . . . . . . . . . . . . . . . . . . . . . 66

11
12

4.3. Supercategories and categories in Canadian Institute For Advanced


Research (CIFAR)-100 with the original designations . . . . . . . . . . 67
4.4. Header lines of Fisher’s Iris Data Set . . . . . . . . . . . . . . . . . . . 68
4.5. Output of the categories of Fisher’s Iris Data Set . . . . . . . . . . . . 70
4.6. Names of the categories in Fisher’s Iris Data Set . . . . . . . . . . . . 70

5.1. Tensorflow Datenflussdiagramm . . . . . . . . . . . . . . . . . . . . . . 78

6.1. Hand Landmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


6.2. Face Detection Using Landmarks . . . . . . . . . . . . . . . . . . . . . 88

8.1. Application example: Bottle detection . . . . . . . . . . . . . . . . . . 93

9.1. Jetson Nano: Oversight . . . . . . . . . . . . . . . . . . . . . . . . . . 96


9.2. Jetson Nano: Oversight with labels . . . . . . . . . . . . . . . . . . . . 96
9.3. Side view of the Jetson Nano with labels . . . . . . . . . . . . . . . . . 97
9.4. J44 Serial interface - 3.3V 115200 Baud, 8N1 . . . . . . . . . . . . . . 99
9.5. J40 Buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.6. J15 5V fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.7. Jetson Nano SDK , source: Heise, 29.11.2019 . . . . . . . . . . . . . . 102
9.8. Dialogue for formatting a SD card . . . . . . . . . . . . . . . . . . . . 103
9.9. Etcher start dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.10. Unformatted file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.11. Windows message for an unformatted microSD card . . . . . . . . . . 104
9.12. Welcome message after the first start . . . . . . . . . . . . . . . . . . . 105
9.13. Raspberry Pi Camera Module V2[Ras19] . . . . . . . . . . . . . . . . . 110
9.14. Cable to the camera [Ras19] . . . . . . . . . . . . . . . . . . . . . . . . 111

10.1. Camera matrix calculated by the cameracalibrate function for the


logitech webcam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2. Distortion coefficients returned . . . . . . . . . . . . . . . . . . . . . . 114
10.3. Image of the chessboard before and after removing distortion . . . . . 115

11.1. Working of a camera [Kum19] . . . . . . . . . . . . . . . . . . . . . . . 117


11.2. Example of a Radial Distortion [Kum19] . . . . . . . . . . . . . . . . . 117
11.3. Example of a Tangential Distortion [Kum19] . . . . . . . . . . . . . . 118
11.4. Camera Calibration flow chart [SM20] . . . . . . . . . . . . . . . . . . 119
11.5. Effect of distortion on Checkerboard images [SM20] . . . . . . . . . . . 119
11.6. The Image of the chessboard passed to the opencv functions to identify
and draw corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
11.7. Result after detection of corners . . . . . . . . . . . . . . . . . . . . . 120
11.8. Camera matrix calculated by the cameracalibrate function for the
logitech webcam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11.9. Distortion coefficients returned . . . . . . . . . . . . . . . . . . . . . . 122
11.10.Image of the chessboard before and after removing distortion . . . . . 122

12.1. Kamera IMX477 der Firma Arducam; [Ard21] . . . . . . . . . . . . . . 127

13.1. Kalibrierungswerkzeug der Firma Arducam; [Ard21] . . . . . . . . . . 131

14.1. Jetson Nano MSR-Lab in case with Pi camera, front view . . . . . . . 133
14.2. Connections of the Jetson Nano MSR-Lab in the case . . . . . . . . . 133
14.3. J41 header of the Jetson Nano MSR-Lab in the case, rear view . . . . 134
14.4. Circuit for testing the GPIO pins with LED . . . . . . . . . . . . . . . 135
14.5. Circuit for testing the GPIO pins with LED . . . . . . . . . . . . . . . 136
List of Figures 13

16.1. Python Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

17.1. help text for conda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


17.2. list of installed packages . . . . . . . . . . . . . . . . . . . . . . . . . . 148
17.3. list of all created virtual environments . . . . . . . . . . . . . . . . . . 148
17.4. Anaconda Navigator: Calling the virtual environments area . . . . . . 149
17.5. Anaconda Navigator: Creating a virtual environment . . . . . . . . . . 150

18.1. Training history of the simple model visualised with TensorBoard . . . 159
18.2. Training history of the simple model visualised with TensorBoard . . . 160
18.3. Training history of the simple model visualised with TensorBoard . . . 161
18.4. Trainingshistorie des einfachen Modells visualisiert mit TensorBoard . 162
18.5. Header lines of the data set Iris . . . . . . . . . . . . . . . . . . . . . . 164
18.6. Output of the categories of the data set Iris . . . . . . . . . . . . . . . 166
18.7. Names of the categories in the data set Iris . . . . . . . . . . . . . . . 166
18.8. Checking the data set Iris . . . . . . . . . . . . . . . . . . . . . . . . . 167
18.9. Variance in the data set Iris . . . . . . . . . . . . . . . . . . . . . . . . 167
18.10.Coding of the categories before (left) and after conversion . . . . . . . 168
18.11.Course of the accuracy for the training and test data during the training
of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
18.12.Course of the loss function for the training and test data during the
training of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
18.13.Course of accuracy for the training and test data during the training of
the second model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
18.14.Course of the loss function for the training and test data during the
training of the second model . . . . . . . . . . . . . . . . . . . . . . . . 174
18.15.Categories in CIFAR-100 . . . . . . . . . . . . . . . . . . . . . . . . . 176
18.16.Visualisation of the first 25 training data from the data set CIFAR-10 178
18.17.Visualisation of the first 25 training data from the data set CIFAR-100 179
18.18.Visualisation of the first 25 training data in the filtered CIFAR-100
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
18.19.Visualisation of the transition between the two data sets . . . . . . . . 181
18.20.Visualisation of the transition generated data set . . . . . . . . . . . . 182
18.21.Visualisierung von Engpässung mit nvtop Quelle:[Meu20] . . . . . . . 192
18.22.Visualisierung von Engpässung mit TensorBoard Quelle:[Meu20] . . . 193
18.23.Bottleneck Daten laden Quelle:[Meu20] . . . . . . . . . . . . . . . . . . 193

19.1. Tensorflow Lite workflow [Kha20] . . . . . . . . . . . . . . . . . . 198


19.2. The basic process for creating a model for an edge computer[Goo19a] . 198
19.3. Testbilder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

25.1. Hama Mini-Stativ „Ball“, XL, Schwarz/Silber . . . . . . . . . . . . . . 335


Acronyms
API Application Programming Interface
CIFAR Canadian Institute For Advanced Research
CNN Convolutional Neural Network
COCO Common Objects in Context
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
FP16 Floating Point 16-bit
FP32 Floating Point 32-bit
FPGA Field-Programmable Gate Array
GPIO General Purpose Input Output
GPU Graphics Processing Unit
I2 C Inter-Integrated Circuit
I2 S Inter-IC Sound
INT8 Integer 8-bit
IoT Internet of Things
KNN Künstliches Neuronales Netz
NN Neuronales Netz
ReLu Rectified Linear Unit
ResNet Residual Neural Network
RLE Run-Length Encoding
SBC Single-Board-Computer
SCL Serial Clock
SD-1 NIST Special Database 1
SD-3 NIST Special Database 3
SDA Serial Data
SPI Serial Peripheral Interface
SPS Speicherprogrammierbare Steuerung
SSH Secure Shell
UART Universal Asynchronous Receiver Transmitter
VPU Video Processing Unit

15
Part I.

Machine Learning

17
1. Artificial intelligence - When are
machines intelligent?
Answering the question is not easy. First, the term intelligence must be defined. Its
definition is not unambiguous. For example, it is not defined at what point a person
is considered intelligent. This raises the question of what belongs to intelligence. In
psychology, intelligence includes, among other things, the ability to adapt to unknown
and situations, but also to solve new problems.[FVP98]

1.1. What is Artificial Intelligence?


Consequently, the term artificial intelligence is also not clearly defined. Historically,
the term was introduced by John McCarthy [McC+06]:

„The aim of AI is to develop machines that behave as if they had intelligence.“ - John
McCarthy, 1955

According to this definition, simple machines and sensors are intelligent. They act
in accordance with a programme. This means that all reactions of the system have
been determined and fixed beforehand. If individual characteristics of humans are
considered, then, according to machines, a great many machines can be considered
intelligent; especially since machines have taken over activities that are wholly or
partially performed by humans. A simple example is computer programs. They are
many times more efficient in this area than humans. Hence the following definition
[Ric83]:

„AI is the study of how to make computers do things which, at the moment, people
do better.“ - Elaine Rich, 1983

This approach is clever in that it does not circumvent the definition of intelligence.
On the other hand, it points out that it is a snapshot in time. This takes into
account, that the human being is very adaptable. This adaptability distinguishes him
in particular. Man is able to grasp new external influences and adapt to them. He
can learn independently.

In the decade from 2010 to 202, many impressive results were achieved in the field of
artificial intelligence. In 2011, the computer Watson defeated the human champion
in the very popular guessing show „Jeopardy“ [Fer12]. Professional players of the
board game Go were beaten by System AlphaGo for the first time in 2015 and 2016
[Wan+16]. Even bluffing in poker was successfully applied by an artificial intelligence
called Libratus in 2017 [BS18]. In 2018, a computer program from the Chinese
corporation Alibaba shows better reading comprehension than humans. The software,
GPT-3, writes 2020 non-fiction text and poetry that looks like it was created by
humans. Artificial intelligences have also become active in the art world; new songs by
the Beatles have been composed and paintings created in the style of famous painters.
These successes tempt one to conclude that computer software is superior to human
intelligence. To clarify the difference, the terms weak and strong intelligence are
introduced. Strong artificial intelligence is for all-encompassing software that can

19
20

Strong Artificial Intelligence

Weak Artificial Intelligence


Machine Learning
Deep Learning

Figure 1.1.: relationship of artificial intelligence, machine learning and deep learning

adapt to different or new requirements. In contrast, weak artificial intelligence means


that an algorithm has been developed to solve a specific task. They can solve complex
and demanding tasks from very different fields as well as humans or better. The above
list shows that weak AIs deliver stunning results. But strong artificial intelligence
does not exist any more than a time machine. In the graphic 1.1 the relationships
between strong, weak Artificial Intelligence, Machine Learning and Deep Learning are
shown; according to this, Deep Learning is a subfield of Machine Learning and this in
turn is a subfield of Artificial Intelligence.

1.2. Machinelles Lernen


Machine learning represents the ability of a machine to extract knowledge from
available data and recognise patterns. If necessary, this pattern can then be applied
to subsequent data sets. The way in which the learning process takes place can be
divided into three categories:

Supervised learning uses training and test data for the learning process. The training
data includes both input data (e.g. object metrics) and the desired result (e.g.
classification of objects). The algorithm should then use the training data to find
a function that maps the input data to the result. Here, the function is adapted
independently by the algorithm during the learning process. Once a certain
success rate has been achieved for the training data, the learning process is
verified with the help of the test data. An example of this would be a clustering
procedure in which the clusters are already known before the learning process
begins.

Unsupervised learning uses only input data for the learning process, where the out-
come is not yet known. Based on the characteristics of the input data, patterns
are to be recognised in this. One application of unsupervised learning is the
clustering of data where the individual clusters are not yet defined before the
learning process.

Reinforcement learning is based on the reward principle for actions that have taken
place. It starts in an initial state without information about the environment or
about the effects of actions. An action then leads to a new state and provides a
reward (positive or negative). This is done until a final condition occurs. The
learning process can then be repeated to maximise the reward.
Chapter 1. 21

Input Hidden Output


layer layer layer

Input
Output
Input
Input
Output
Input

Figure 1.2.: Example of a KNN

1.3. Deep Learning


Deep Learning represents a sub-field of machine learning and uses a Neuronales Netz
(NN) for the learning process. These represent a model of the human brain and
the neuronal processes. The Künstliches Neuronales Netz (KNN) consists of nodes
representing neurons. A distinction is made between three categories of neurons.

input neurons are the neurons that receive signals from the outside world. Here there
is one neuron for each type of input (feature).

hidden neurons are neurons that represent the actual learning process.

output neurons are the neurons that give signals to the outside world. There is one
neuron for each type of output (feature).

All neurons of a category are combined into a layer. Thus, in each neural network
there is an input layer, see figure1.2 green neurons, a hidden layer, in the figure 1.2
the blue neurons, and an output layer with the red neurons in the figure 1.2. If a
KNN contains more than one hidden layer, it is called a deep neural network. The
connections between the neurons in each layer are called synapses. These contain a
weighting, which is multiplied by the signal of the start neuron. Thus, the individual
signals are weighted. The weights are in turn adjusted during the learning process,
based on functions.

1.4. Application
Due to the adaptability of artificial intelligences, they can be used in many different
ways. In its study „Smartening up with Artificial Intelligence (AI)“, McKinsey
described eight application scenarios in which artificial intelligence has particular
potential [Mck].

• Autonomous Vehicles

• Predictive maintenance through better forecasting

• Collaborative robotics with perception of the environment (machine-machine


interaction & human-machine interaction)

• Quality improvement through autonomous adaptation of machines to products


to be processed
22

Published as a conference paper at ICLR 2015

,, Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Moseni

attacks shed lights on how do


+ .007 × = plication domain, in particular
an attacker to target the under
Our key contributions can
x+ 1. Introducing new attac
x sign(∇x J(θ, x, y))
sign(∇x J(θ, x, y)) ods to create toxic signs tha
“panda” “nematode” “gibbon”
57.7% confidence 8.2% confidence 99.3 % confidence at the same time, deceive the
Figure signs cause misclassification
Figure 1:1.3.: Overlay
A demonstration ofadversarial
of fast the image of a panda
example generation applied towith noise
GoogLeNet [GSS14]
(Szegedy
et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to world consequences including
the sign of the elements of the gradient of the cost function with respect to the input, we can change confusion. We significantly
GoogLeNet’s classification of the image. Here our  of .007 corresponds to the magnitude of the
smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers. world ML systems in two wa
of-Distribution attack, whi
Let θ be the parameters of a model, x the input to the model, y the targets associated with x (for an arbitrary point in the ima
machine learning tasks that have targets) and J(θ, x, y) be the cost used to train the neural network. amples, as opposed to prior a
We can linearize the cost function around the current value of θ, obtaining an optimal max-norm
constrained pertubation of drawn from the training/testi
η = sign (∇x J(θ, x, y)) . is motivated by the key insig
We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that the through a complex environm
required gradient can be computed efficiently using backpropagation.
can potentially used by an atta
We find that this method reliably causes a wide variety of models to misclassify their input. See
Fig. 1 for a demonstration on ImageNet. We find that using  = .25, we cause a shallow softmax previously provided a high-le
classifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) test attack in an Extended Abstra
set1 . In the same setting, a maxout network misclassifies 89.4% of our adversarial examples with
an average confidence of 97.6%. Similarly, using  = .1, we obtain an error rate of 87.15% and in-depth explanation of this a
an average probability of 96.6% assigned to the incorrect labels when using a convolutional maxout fectiveness in various experim
network on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set2 . Other
simple methods of generating adversarial examples are possible. For example, we also found that Lenticular Printing attack
rotating x by a small angle in the direction of the gradient reliably produces adversarial examples. non to fool the sign recognit
Figure 2: Toxic signs for our traffic sign recognition pipeline
The fact that these simple, cheap algorithms are able to generate misclassified examples serves as few toxic signs created by the
Figure 1.4.: Manipulation of traffic signs [Sit+18]
generated
evidence in favor of our interpretation using asthe
of adversarial examples a resultIn-Distribution attack, the Out-of-
of linearity. The algorithms
are also useful as a way of speeding Distribution
2. Extensive experimenta
attack
up adversarial training or even(Logo andofCustom
just analysis Sign), and the Lentic-
trained networks.
attacks in both virtual and re
5 A DVERSARIAL TRAININGular Printing
OF LINEAR attack.
MODELS The adversarial
VERSUS examples are classified
WEIGHT DECAY
parameters. We consider both
aswethe
candesired
consider istarget
logistic traffic
• Automated quality inspection Supply chain management through more accurate
Perhaps the simplest possible model regression.sign with
In this case,high
the fastconfidence under a versary has access to the detail
gradient sign method is exact. Wevariety ofcase
can use this physical conditions
to gain some intuition forwhen printed out. The Lentic-
how adversarial
forecasting Research and development
examples are generated in a simple setting. See Fig. 2 for instructive images.
ular Printing attack samples flip the displayed sign depend-
and the black-box one (i.e., such
If we train a single model to recognize labels y ∈ {−1, 1} with P (y = 1) = σ w> x + b where that adversarial examples (cr
ing on the viewing angle, simulating the view of the human
• Automated support processes
σ(z) is the logistic sigmoid function, then training consists of gradient descent on the image space or traffic sig
driver and the camera in the autonomous car.
Ex,y∼pdata ζ(−y(w> x + b)) nition system with high confi
where ζ(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form for can achieve significant attack
[24, 25]. Kurakin et than
al. [24]
examples, print
basedout virtualsign
adversarial tings. To provide a thorough
1.5. Limits of Artificial
1 take Intelligence
training on the worst-case adversarial perturbation of x rather x itself,
their pictures, and then pass them through the original classi-
on gradient
also conduct real-world driv
This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or
1, but the images are essentially binary. fier. Sharif
Each pixel et encodes
roughly al. [25] attack
“ink” faceThisrecognition
or “no ink”. systems by allowing
justifies expecting camera continuously capture
Image recognition has made huge progress in recent years []. Nevertheless, limitations
the classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers can
read such images without difficulty. subjects to wear glasses with embedded adversarial perturbations offers its data to the sign reco
are also becoming apparent. In this context, the image of a panda from a paper by
2
that can deceive the system. However, both these attacks were car-
See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/
papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.
attack success rates in excess
ried outisin shown
Ian Goodfellow and other researchers controlled[GSS14]. where variations
An algorithm
laboratory setting in bright-
identifies the both Out-of-Distribution and
ness, viewing
panda with a probability of 57,7%. If the 3 image is overlaid with a second imagetaken
angles, distances, image re-sizing etc. were not that 3. Studying the effect of O
into account, imposing limitations on their effectiveness in real- of-the-art defenses: We discu
a human perceives as noise, a gibbon is identified with a probability of 99,3, even
world scenarios. Further, they only focus on creating adversarial ing based defenses[14] in miti
though in a human’s eyes the image has
examples fromnottraining/testing
changed. The datasituation
(e.g., imageisofillustrated
different facesin Out-of-Distribution attacks,
Figure 1.3. in the face recognition system) to attack the underlying systems. come from the underlying t
The example with the Panda is We refer to these
harmless. as In-Distribution
In contrast, attacks.
the effects in Athalye
road trafficet al. [26]
can form In-Distribution attacks
be devastating. Research showsintroduced
that by the Expectation
simply over Transformations
manipulating road signs, (EOT)artificial
method defenses in which adversaria
to generate physically robust adversarial examples and concurrent initial training dataset and a
intelligence can make false interpretations. [Sit+18] Well-defined manipulations can
work by Evtimov et al. [27] used this method to attack traffic sign Known ML-based defenses a
be carried out by selectively altering
recognition systems. AIn
an image. the image
detailed 1.4,ofaour
comparison sign
workmarking
with these a lar Printing attack which has
speed limit is transformed into aisstop
at thesign
end ofbytheadding spots.
introduction. physical characteristics of the
Conversely, an optimal image canInbe thisdetermined
paper, we focus on physically-realizable
from the neuronal networks attacks against
that Comparison with Athalye
sign recognition
delivers a result with a probability close to 100%.system utilized in autonomous
This also producescars, one of that
images the Athalye et al. [26] introduced
most important upcoming application of ML [5, 6, 28, 29]. We in- (EOT) method to generate ph
troduce two novel types of attacks, namely, Out-of-Distribution and They used the method to gene
Lenticular Printing attacks that deceive the sign recognition system, which remain adversarial und
leading to potentially life-threatening consequences. Our novel ples are evaluated on classifi
Chapter 1. 23

Figure 1: A real-world attack on VGG16, using a physical patch generated by the white-box ensemble
Figure 1.5.: Optimal image for the identification of a toaster [Sit+18]
method described in Section 3. When a photo of a tabletop with a banana and a notebook (top
photograph) is passed through VGG16, the network reports class ’banana’ with 97% confidence (top
plot). If we physically place a sticker targeted to the class "toaster" on the table (bottom photograph),
the photograph is classified as a toaster with 99% confidence (bottom plot). See the following video
for a full demonstration: https://youtu.be/i1sp4X57TL4
the human eye could not recognise as such. A classic example is the toaster in the
image 1.5.[Bro+17]
This attack is significant because the attacker does not need to know what image they are attacking
when constructing the attack. After generating an adversarial patch, the patch could be widely
distributed across the Internet for other attackers to print out and use. Additionally, because the
attack uses a large perturbation, the existing defense techniques which focus on defending against
small perturbations may not be robust to larger perturbations such as these. Indeed recent work has
demonstrated that state-of-the art adversarially trained models on MNIST are still vulnerable to larger
perturbations than those used in training either by searching for a nearby adversarial example using a
different metric for distance [14], or by applying large perturbations in the background [1].

2 Approach

The traditional strategy for finding a targeted adversarial example is as follows: given some classifier
P [y | x], some input x ∈ Rn , some target class yb and a maximum perturbation ε, we want to find the
input xb that maximizes log (P [by|xb]), subject to the constraint that kx − x
bk∞ ≤ ε. When P [y | x] is
parameterized by a neural network, an attacker with access to the model can perform iterated gradient
descent on x in order to find a suitable input x
b. This strategy can produce a well camouflaged attack,
but requires modifying the target image.
Instead, we create our attack by completely replacing a part of the image with our patch. We mask
our patch to allow it to take any shape, and then train over a variety of images, applying a random
translation, scaling, and rotation on the patch in each image, optimizing using gradient descent. In
particular for a given image x ∈ Rw×h×c , patch p, patch location l, and patch transformations t
(e.g. rotations or scaling) we define a patch application operator A(p, x, l, t) which first applies the
transformations t to the patch p, and then applies the transformed patch p to the image x at location l
(see figure 2).

2
2. Processes for Knowledge Acquisition
in Databases
2.1. Knowledge Discovery in Databases (KDD Process)
Knowledge discovery in databases, or KDD for short, is the decisive step in the
examination of data. There are various possibilities for knowledge discovery. However,
in order to achieve the goal efficiently and systematically, a suitable process is of
immanent importance. In this section, the KDD process is considered in detail. In
this process, knowledge discovery is divided into several steps. [Düs00]

2.1.1. Application Areas


The KDD process can basically be applied whenever large amounts of data are available
from which knowledge can be extracted. The KDD process was presented in the
article by Fayyad. [FPSS96] There, astronomy, investment, marketing, fraud detection,
manufacturing processes and telecommunications are given as examples on areas of
application. But the KDD process is not limited to these areas. Thus, [MR10] presents
various applications in the fields of finance, medicine, detection of attacks on computer
systems or customer relationship management. Although these examples exploit all
elements of the KDD process in detail, application in these areas demonstrates the
possible use of the entire KDD process with data mining as a core element. [PUC02]
describes the application of the KDD process to data sets on tourist destinations,
which can lead to insights into possibilities of increasing the attractiveness of individual
destinations.
The literature mentioned shows the versatility of the KDD process. Basically, it can
be used everywhere where sufficient data is available and a generalisation of certain
structures is permissible.

2.1.2. Process Model


One difficulty regarding the KDD process is that „no general process model of Knowl-
edge Discovery in Databases is currently available “[Düs00]. There are several models
that show the process steps starting from the data to the extracted knowledge. One of
the first and according to [KM06] the most cited model is the one according to Fayyad
[FPSS96]. Therefore, it will be presented in the following. According to [KM06], it is
mainly the KDD process according to Fayyad and CRISP-DM that are also applied
in the technical/engineering field, while many of the other models are rather aimed
at areas in and around marketing. Therefore, both (Fayyad and CRISP-DM) will be
described. For this purpose, a generic model and such a model designed for application
with edge computers will be presented.

2.1.3. KDD-Prozess based on Fayyad


On an abstract level, the field of KDD is concerned with the development of methods
and techniques to „make sense of data “[FPSS96]. This requires going through several
steps. The iterative nature of the process should be noted: Again and again, one can
and must return to the previous steps and make corrections based on the knowledge

25
26

gained. It may be necessary to obtain further data or introduce other pre-processing


steps. [Wro98]
The basic flow of the KDD process and the iteration loops included can be seen in
Figure 2.1.

Data Data Data Trans- Data evaluation/


Databases knowledge
Selection Preparation formation Mining interpretation

Figure 2.1.: KDD process according to [FPSS96] (own reduced representation based
on [FPSS96])

At the beginning of the KDD process there is always the available database from which
the knowledge is to be gained. From this, the data to be used must be selected from
one or more databases and the data must then be processed and prepared. Only then
can the data be used for knowledge extraction with the help of data mining methods.
After evaluation and possibly several runs through the various steps, the knowledge
gained can finally be extracted and used in the application.
Fayyad further divides this process into nine sub-steps, some of which are not directly
represented in Figure 2.1. These nine steps are described below.

1. Domain Knowledge (Developing an Understanding of the Application Domain)


.
First, the domain of the application must be defined and understood. This includes
the domain knowledge needed for the application to be able to perform the analysis in
a way that leads to real added value. In addition, the precise objectives of the KDD
process must be defined for the use case and the specific end user of the knowledge
[FPSS96; KM06]. This step forms the basis for making the decisions necessary in the
next steps. necessary decisions in the next steps [MR10].

2. Creating a Target Data Set


Next, a data set or a subset of a data set is selected. Single or multiple variables
can be selected to narrow down the data from which knowledge is to be extracted.
[FPSS96] This involves determining what data is available and possibly obtaining any
data that is still missing. It is important that all necessary information is included.
Deciding which variables are necessary must be based on domain knowledge. After
selecting the variables, as many representatives of the variables as possible should
become available to the algorithm for a good model. On the other hand, the collection,
storage and processing of larger data sets is more costly and many variables can also
lead to results that are difficult to interpret. Therefore, the relevance of returning to
steps that have already been carried out when gaining knowledge should already be
seen in this step.[MR10]

3. Data Cleaning and Preprocessing


This step aims to improve data reliability. This includes removing noise and outliers,
but also dealing with missing data. If an attribute is not reliable enough, a data
mining method can already be used at this point to improve the data.[MR10] Time
series information and known changes must also be taken into account. [KM06] Data
cleaning and preparation is the most time-consuming step in the KDD process. Its
share of the time to be spent on the KDD process is estimated by many sources to be
about 60 %.[KM06]
Chapter 2. Processes for Knowledge Acquisition in Databases 27

4. Data Reduction and Transformation (Data Reduction and Projection)

.
To further improve the data, useful features are sought to represent the data [FPSS96].
Possible methods include dimensionality reduction and transformation. The imple-
mentation of this step will vary greatly depending on the application and domain.
As in the previous steps, the most effective transformation may emerge during the
KDD process [MR10]. The importance of this step is made clearer by the application
example in section 2.6.

5. Choosing the DM Task

After the preparation of the data has been completed, a suitable data mining method
is selected. Which method is suitable depends largely on the objectives defined in the
first step [FPSS96]. A basic distinction is made between descriptive methods, which
aim at interpreting and understanding the data, and predictive methods, which provide
a behavioural model that is able to predict certain variables for previously unseen
data. For this purpose, regressions can be used on the one hand, and classification
methods such as decision trees, support vector machines or neural networks on the
other. Some of these models can also be used for descriptive statements. [MR10]

6. Choosing the DM Algorithm

In this step, the specific data mining algorithm to be used is selected. If, for example,
it was decided in the previous step that a predictive method should be chosen, a choice
must now be made between, for example, neural networks and decision trees, whereby
the former provide more precise descriptions of the correlations, but the latter are
easier to interpret. [MR10]

7. Data Mining

In the seventh step, data mining is finally applied. Often data mining and KDD
are used synonymously, but in fact data mining is a sub-step of the KDD process.
[FPSS96] This step may also have to be carried out several times, possibly even
immediately one after the other, to optimise the hyperparameters. [MR10]

8. Interpreting the Results (Interpreting Mined Patterns)

After the data mining has been successfully completed, the results are evaluated and
interpreted. The visualisation of found patterns or models can be helpful for this.
In this step, the influence of the data preparation steps is reflected on and possibly
returned to. It must also be evaluated whether the model found is plausible and usable.
[FPSS96; MR10]

9. Applying the Results (Acting on Discovered Knowledge)

The discovered knowledge can now be used. The product of this can be documentation
or reports that record the knowledge obtained so that it can be used by interested
parties. Furthermore, the extracted knowledge can be integrated into another system,
e.g. in the form of a prediction model. By applying it under real conditions, e.g. with
dynamic instead of static data, deficits can become apparent that still need to be
reworked. [FPSS96; MR10]
28

2.2. Generic Process Model


Based on the models of Fayyad et al, Cabena et al, Cios et al and Anand & Buchner
and the CRISP-DM model, [KM06] describes a generic KDD model. This is structured
very similarly to Fayyad’s model described above and also runs through the steps
iteratively. However, the steps are summarised differently, so that the model gets by
with six instead of nine steps.

2.2.1. Application Domain Understanding


As with Fayyad, the first step is to understand the application and objective as a
prerequisite for the following steps.

2.2.2. Understanding the Data (Data Understanding)


The general understanding of the data corresponds roughly to the second step according
to Fayyad. It involves looking at the available data set and identifying opportunities
and problems.

2.2.3. Data Preparation and Identification of DM Technology


This step combines steps three to six according to Fayyad. This makes sense insofar
as the type of data preparation and the data mining method to be applied influence
each other and a clear separation is therefore difficult.

2.2.4. Data Mining


The data mining step as the core of the KDD process remains unchanged.

2.2.5. Evaluation
The generic model also emphasises the iterative nature of the KDD process. Therefore,
an evaluation of the obtained results is indispensable.

2.2.6. Knowledge Deployment (Knowledge Consolidation and


Deployment)
This step also hardly differs from the last process step according to Fayyad. The
different models mainly differ in the way the discovered knowledge is used, which is
due to the different application domains. While the models geared towards business
applications mainly focus on reports and visualisation, the technical applications have
a stronger emphasis on dynamic uses of created models.

2.3. CRISP-DM
While the first KDD models seem to be strongly oriented towards business applications,
the CRoss-Industry Standard Process for Data Mining (CRISP-DM) is intended to be
applicable to all industries and thus also to technical problems. CRISP-DM was devel-
oped in 1996 as part of a merger of the companies DaimlerChrysler, SPSS and NCR.
The aim was to embed data mining in an industry-, tool- and application-independent
process. The methodology is described with a hierarchical process model consisting of
four abstraction levels: phases, generic tasks, specialised tasks and process instance.
While the phases are always run through in the same way and the generic tasks are
the same, which are described in more detail by the specialised tasks, the last level
deals with the individual scope of application. [Cha+00]
Chapter 2. 29

The CRISP-DM process model contains the life cycle of a data mining project as shown
in Figure 2.2. The phases shown are largely consistent with the phases described
in the generic model. Ultimately, the KDD process is embedded in CRISP-DM.
As mentioned, individual generic tasks are assigned to each phase, which must be
processed in accordance with the specific task. In the phase „Data preparation“this
includes data selection, cleansing, construction, integration and formatting. [Cha+00]

Business
Understanding Data
Understanding

Data
Deployment
Data Preparation

Modeling
Evaluation

Figure 2.2.: Life cycle of a data mining project in CRISP-DM according to [Cha+00]

This model also emphasises the importance of returning to past phases during the
process. In Figure 2.2 only the most important and frequent connections between the
phases are shown. The outer circle represents the cyclical nature of the data mining
process itself. [Cha+00]

2.4. Application-Oriented Process Models


The models shown remain very abstract and little related to actual technical appli-
cations in a technical environment. In order to represent the actual processes when
using different systems for knowledge generation and for use in the field, the following
representations can be helpful.
For the description of artificial intelligence projects with the Jetson Nano, a modified
KDD process model is assumed, which largely corresponds to the model according to
Fayyad, but still contains some minor changes. This modified process model can be
seen in Figure 2.3.

Data Data Data Trans- Data


Databases Model
Selection Preprocessing formation Mining

Interpretation
Evaluation/

Figure 2.3.: Modified model of the KDD process based on the model according to
[FPSS96]

The lines marking the iterative loops are solid and not dashed to emphasise the
absolute necessity of these iterative correction steps. However, the feedback from the
30

model is shown dashed because in this process model it is assumed that the knowledge
gained is used in the form of a model on an edge computer, here the Jetson Nano.
This complicates direct feedback and corrections.
In Figure 2.4 the distribution of the steps on different levels as well as localities and
systems becomes clear. This illustration refers to the application in a production
environment. It would be conceivable, for example, to create a system for process
control that detects defective workpieces with the help of a camera. The database
would be collected with this camera. The use of the data, including the steps explained
in the previous sections, takes place in an office outside of production. However, the
results found are used again in the control system in production, where they are
exposed to new data. This new data can be added to the database(s) and possibly
a feedback between the results in the field to a review and correction of the models
taking place in the office can also be realised.

Production
Model,
Databases New Data Results
Patterns

Data Data Data Trans- Data


Selection Preparation formation Mining

Evaluation,
Verification
Work preparation

Figure 2.4.: KDD process in the industrial environment

2.5. Role of the User


Brachman and Anand [BA94] criticise that the usual definitions of KDD are too
focused on the resulting information and do not adequately reflect the complexity of
the process. According to them, the role of the human being must be emphasised
more, even if in the long run there is a move towards fully autonomous integrations
of the KDD process. Front-end requirements must be recognised and reconciled, and
data preparation, analysis and evaluation, as well as monitoring of the individual
steps through visualisation, also require domain knowledge on the part of the user
and his or her intervention. To support this, the authors recommend a corresponding
working environment (Knowldege Discovery Support Environment) that allows easy
data exchange between the different work steps as well as facilitates the reuse of already
created structures. In addition, a system could support the user in the selection of
suitable methodologies.

2.6. KDD Process using the Example of Image


Classification with an Edge Computer
To look at the process in relation to a tangible application, the development of an
image classification with an Edge Computer as a KDD process will be highlighted as
an example.
Chapter 2. 31

2.6.1. Database
As a data basis for an image classification, a sufficient number of images including their
correct classification in terms of the application, the label, must be available. There
are various image databases that are freely accessible, including ImageNet, MS-COCO
and CIFAR-10, for example. Of course, only such databases come into question that
contain objects and features that the model is to recognise later. [Den+09; Den12;
SW16; Aga18a]
File formats and colour depth as well as the number of colour channels should also
already be taken into account when searching for suitable databases, as a (lossless)
transformation to the required file formats is not possible in every case or the database
might contain less information than desired. Which file formats, colour depths and
colour channels to use depends on the desired application, the training platform used
and the algorithm used.

2.6.2. Data Selection


The images to be used can now be selected from the databases that have been identified,
are available and can be used for the application. For example, a database could be
used in its entirety, but only a part of such a database can be used or several databases
can be combined. It is extremely important that all classes that are to be recognised
are also present in the selected data. How many images are selected depends on the
planned scope of the training and the test, which has a significant influence on the
training time, but possibly also on the performance of the model obtained.

2.6.3. Data Preparation


Data preprocessing is usually primarily concerned with data cleaning. This mainly
includes the removal of outliers. In terms of images, these outliers are mainly noisy
or poorly resolved images. These should be removed or processed with suitable
algorithms. When using images from the well-known open-source image databases,
such as ImageNet [Den+09], image quality should not be a problem.
Furthermore, the images must be converted into the format supported by the respective
platform. It should be taken into account that compressed file formats cannot always
be decompressed without loss.

2.6.4. Data Transformation


Finally, the images must be transformed into a representation that can be used in data
mining. It may be necessary to adjust the colour depth per channel or the number of
channels. An RGB image may need to be transformed into a greyscale image. It is
also common to normalise the images, for example to mean 0 and standard deviation
1.
The algorithm chosen for the data mining phase also determines the input format of
the data. Applied to images, this means which file format to use. In addition to the
file format, however, the colour resolution determines the number of pixels. In this
phase the images are adjusted accordingly.

2.6.5. Data Mining


In the data mining step, the actual model building takes place. First, the algorithm
must be chosen or the architecture defined and the model built accordingly. If deep
learning algorithms are used, the architecture must be defined. With the definition
of the architecture, the hyperparameters are determined. Then the algorithm can be
trained with the selected images. During training, the weights are optimised. Their
32

adjustment is done on the basis of the training images. After a given number of
training images, the validation images are used to test the accuracy of the current
model. After several epochs, i.e. several runs of all training images, the model can be
tested.

2.6.6. Evaluation/Verification
In the first instance, the interpretation and evaluation of the trained model is done
repeatedly during the training with the help of the validation data set. In the second
instance, the model is tested directly after training with the help of the test images
that are still unknown to the model. If the results are satisfactory, the model can
be integrated into the application. In this instance, too, it is important to test the
performance of the model in order to be able to draw conclusions and make corrections.
In the dynamic application, other framework conditions prevail, so that drawing
conclusions about necessary corrections in the previous steps may require great domain
knowledge.
The interpretation of the found model is not trivial for neural networks, especially for
complex networks like Convolutional Neural Network (CNN). It may not be possible
to extract direct correlations that could be transferred to other applications.

2.6.7. Model
The trained model can finally be integrated into the intended application. While the
data preparation and training take place on a computer that is generally equipped
with high computing power, the model is intended to be used on an edge computer.
For example, real-time object recognition could be performed with a camera connected
to the Edge Computer based on the training with the static images. Transferring the
model to the edge computer may require converting the model to a different format.

2.7. Data Analysis and Data Mining Methods at a


Glance
The following is an overview of some data mining techniques.

2.7.1. Definition Data Analysis


Data analysis attempts to extract information from large amounts of data with the help
of statistical methods and to visualise it afterwards. The complex field of data analysis
can be divided into the areas of descriptive, inferential, exploratory and confirmatory
data analysis. In descriptive data analysis, the information of the individual data,
which was taken from a total survey, for example, is condensed and presented in such
a way that the essentials become clear. If only the data of a sample survey (partial
survey) is the basis, the focus of the data analysis rests on the transfer of the sample
findings to the population. This is referred to as inferential data analysis. Explorative
data analysis is concerned with processing the available amount of data in such a way
that structures in the data as well as correlations between them can be revealed and
highlighted to a particular degree. In contrast, the aim of confirmatory data analysis
is to verify correlations.

2.7.2. Statistical Data Analysis


Statistical data analysis usually begins with calculations of the mean and standard
deviation (or variance). It also involves checking the data for normal distribution.
Chapter 2. 33

Hypothesis Test
For the application of the hypothesis test, two hypotheses are always formulated:
a null hypothesis H0 , which mostly implies that the presumed data structure does
not exist, and the alternative hypothesis H1 or HA , which mostly implies that the
presumed data structure exists. The hypothesis test refers to the assumption of a true
null hypothesis [ABF11]. It must be proven that the null hypothesis is false. First,
values such as the T-value ("The T-value is the quotient of the arithmetic mean of the
differences between two variables being compared and the estimate of the standard
error for this mean in the population." [ABF11]) or the F-value is calculated. These
values are compared with a critical value adjusted to the situation. If this calculated
value is smaller than the critical value, the null hypothesis is considered true. In
this case, the P-value, i.e. the probability with which the result related to the null
hypothesis occurs, is also only slightly smaller than 0.05. If, on the other hand, the
P-value is very small, the null hypothesis is rejected. It is considered relatively certain
that the null hypothesis is false; however, it is not known which hypothesis is correct.
Conversely, if the null hypothesis is not rejected, it cannot be concluded that it is
correct [ABF11]. In this case, the result cannot be interpreted.

Normal Distribution, P-Test, T-Test


The P-value (significance value) indicates how extreme the value of the test statistic
calculated on the basis of the data collected is. It also indicates how likely it is to
obtain such a sample result or an even more extreme one if the null hypothesis is
true. Since the value is a probability value, it takes on values between zero and one.
The P-value therefore indicates how extreme the result is: the smaller the P-value,
the more the result speaks against the null hypothesis. Usually, a significance level
α is set before the test and the null hypothesis is then rejected if the P-value is less
than or equal to α [ABF11]. In order to be able to make decisions whether the null
hypothesis should be rejected or retained, fixed limits have been established at about
5 %, 1 % or 0.1%. If the null hypothesis is rejected, the result is called statistically
significant. The size of the P-value does not indicate the size of the true effect.
The T-test refers to a group of hypothesis tests with a T-distributed test test size
[ABF11], but is often differentiated into one-sample T-test and two-sample T-test. The
prerequisite for carrying out the T-test is the normal distribution of the population of
data to be examined. Furthermore, there must be a sufficiently large sample size. If
the data are not normally distributed, the T-test cannot be used and the principle of
the U-test is applied. The one-sample T-test uses the mean value of a sample to test
whether the mean value of the population differs from a given target value. The classic
T-test (two-sample T-test), on the other hand, tests whether the mean values of two
independent samples behave like the mean values of two populations. It is assumed
that both samples come from populations with the same variance. The variance is
the square of the standard deviation. The greater the variance, the flatter the normal
distribution curve. If two sample sizes are compared, the weighted variance must also
be determined. Here, the larger sample has the more decisive influence on the result.

ANOVA (Analysis of Variance)


„Analysis of variance, usually called ANOVA in German, primarily looks for differences
between groups and tests whether dividing the data into different groups reduces
unexplained variability.“[Dor13] Prerequisites for analysis of variance are normally
distributed values in each group, approximate equality of standard deviations and
independence of the measured values from each other. It is tested whether the mean
values of several groups differ. In the simplest case, the null hypothesis is: The mean
values of all groups are equal. Then the following alternative hypothesis results: Not
34

all mean values are equal. The test variables of the procedure are used to test whether
the variance between the groups is greater than the variance within the groups. This
makes it possible to determine whether the grouping is meaningful or not, or whether
the groups differ significantly or not. In its simplest form, the analysis of variance is
an alternative to the T-test. The result is the same as with the T-test,„because the
results of a one-way analysis of variance (one-way ANOVA) and a T-test are identical
if the two samples have the same variance.“[Dor13]

2.7.3. Overview of Data Mining Methods and Procedures


Typical methods of data mining are:

• cluster analysis: grouping of objects based on similarities,


• classification: elements are assigned to existing classes,
• association analysis: identification of correlations and dependencies in the data,
• regression analysis: identification of relationships between variables,
• outlier detection: identification of unusual data sets,
• correlation analysis: examines the relationship between two variables,
• summary: transformation of the data set into a more compact description
without significant loss of information.

Here, outlier detection as well as cluster analysis belong to the observation problems;
classification and regression analysis belong to the prediction problems.

Cluster Analysis & Classification


Cluster analysis is used to reveal similarity structures in large data sets with the
aim of identifying new groups in the data. The similarity groups found can be
graph-theoretical, hierarchical, partitioning or optimising. The objects assigned to a
cluster should be as homogeneous as possible, the objects assigned to different clusters
should be very heterogeneous [JL07]. In addition, several characteristics can be used
in parallel when forming clusters. The cluster analysis is carried out in the following
steps.
At the beginning, each object is considered as a single cluster. Then, the two individual
objects that are most similar to each other are merged. The union reduces the number
of clusters by one. Afterwards, all distances of the individual objects are calculated
again and the two objects with the smallest distance are combined into a new cluster.
This could theoretically be repeated until all objects are united in a single cluster, a
so-called megacluster. However, for the analysis of the data, it is much more significant
to determine the clustering that seems to make the most sense. The cluster number is
determined by looking at the variance within and between the groups. It is determined
which clustering seems to make the most sense, as no termination condition is specified
for the function itself. As a rule, different cluster divisions are advantageous.
Prerequisites for the application of cluster analysis are metrically scaled character-
istics, which can simply be included in the analysis; ordinally and nominally scaled
characteristics must be scaled as dummy variables [JL07]. Characteristics that are
scaled in different dimensions can lead to a bias in the results. These values must be
standardised before carrying out a cluster analysis, for example by a Z-transformation
[JL07]. Furthermore, the characteristics should not correlate with each other.
The distance between two individual objects is determined by the distance measure.
The larger the measure, the more dissimilar the objects are. The various cluster
Chapter 2. 35

methods [JL07] are used to determine the distance between two clusters or a cluster
and a single object. Classification, on the other hand, assigns data to already existing
groups.

Association Analysis & Regression Analysis


„Association analysis generates rules that describe the relationships between the
elements (items) that occur in the records of a dataset.“[GC06] These dependencies are
represented in the form If item A, then item B or A → B. An item is the expression
of an item. An item is the expression of an attribute value of a data set. An example
of a simple rule would be: If a customer buys beer, then 70 per cent of the time he
will also buy chips. These relationships are not assumed as hypotheses, but are to
be discovered from the data with the association analysis. Only after a conspicuous
pattern has been found is it investigated whether it is really a dependency and, if
so, association rules are established for it. The parameters of association rules are
support, confidence and lift. The greater the support value, the more relevant the
rule.
Regression analysis is the analytical procedure for calculating a regression in the
form of a regression line or function. The regression indicates which directional
linear relationship exists between two or more variables [GC06]. The coefficient of
determination (R2) expresses how well the regression line reflects the relationship
between the independent and dependent variable. R2 lies between 0 and 1, whereby
the value R2 = 1 would mean that every observed data point lies directly on the
regression line. By determining a regression function, no statement can be made about
the significance of a correlation. The significance of the regression is determined by
the F-test.
At the beginning of the regression analysis is the preparation of the data. Missing
data are omitted or filled in, data are transformed and the interactions (in linear
regression) are taken into account. By means of mathematical procedures, a function
f is determined so that the residuals e become minimal (residuals: difference between
a regression line and the measured values [Kä11]). Model validation, i.e. checking
whether the model is a good description of the correlation, includes the

• residual analysis,

• overfitting,

• examining the data for outliers and influential data points, and

• multicollinearity of the independent variables.

The validated model can be used to forecast values of y given values of x. To estimate
the uncertainty of the prediction, a confidence interval is often given in addition to
the predicted y-value.

Outlier Detection
Outliers are measured values or findings that are inconsistent with the rest of the
data, for example, by having unusual attribute values or not meeting expectations.
Expectation is most often the range of dispersion around the expected value in which
most measured values are found. „Robust bounds for detecting outliers for many
distribution types can also be derived based on quartiles and quartile distance.“[SH06]
Values of exploratory studies that lie further than 1.5 times the quartile distance
outside this interval are called outliers. Particularly high outliers are shown separately
in the boxplot.
36

For example, the Local Outlier Factor procedure searches for objects that have a
density that deviates significantly from their neighbours, then this is referred to as
density-based outlier detection. Identified outliers are then usually verified manually
and removed from the data set, as they can worsen the results of other procedures.
Before deciding in favour of removing values, it is therefore still necessary to check in
each case what data loss occurs when deleting or labelling the missing values. If the
number of available data sets falls below the level necessary to proceed, the removal
of the outliers should be avoided.

Correlation Analysis
An important task of data analysis is the analysis of the correlation between individual
characteristics. The strength of the connection between two quantitative characteristics
is called correlation in descriptive statistics and inferential statistics and can be
differentiated into linear and non-linear correlation. In multivariate data sets, the
correlation coefficient is additionally calculated for each pair of variables.[Gö] „For
correlation analysis, mainly methods of classical, multivariate, robust and exploratory
statistics are used, but also various non-linear regression methods whose approximation
errors can be used as correlation measures.“[Run15] A prerequisite for performing
correlation analysis is the normal distribution of the data under investigation.

Summary as a Method of Data Mining


By transforming a data set into a more compact description of its information,
summarisation ensures a lossless representation of essential information. The summary
can be textual, visual or combined.
3. Neural Networks
While computers are superior to humans in many areas, there are nevertheless tasks
that a human can intuitively due to their intelligence and ability to learn, while
transferring these abilities to the transferring these abilities to the computer is a
great challenge. A popular approach to the development of artificial intelligence
lies in artificial neural networks. These are information-processing systems whose
structure and mode of operation are modelled on the nervous system, especially the
and especially the brains of animals and humans.
Neuronal networks consist - according to their name - of neurons, which are inter-
connected and influence each other. Like their biological counterparts, the modelled
neurons can, if sufficiently sufficient stimulation by one or more input signals. The
biological neuron can be compared to a capacitor that is charged by small electrical
voltage pulses from other neurons. from other neurons. If the voltage thus reached
exceeds a threshold value, the neuron itself transmits a voltage pulse to connected
neurons. This principle is used in artificial neural networks by mathematical models.
Figure 3.1 shows the first stage of abstraction from biological neural networks to
mathematical models. [Ert16]

Figure 3.1.: Biological (left) and abstracted (right) model of neural networks.
Source:[Ert16]

In 1958, Frank Rosenblatt presented the simplest form of a neural network. This
model, presented as a as a single-layer perceptron, contains a layer of input neurons,
each connected to an output neuron. all connected to an output neuron. [Dör18] If
several neurons are connected in layers in connected in series, we are talking about
the multilayer perceptron. The output neuron acts as an input signal for the following
neuron, until a final output is final output takes place at the output layer.

3.1. Mathematical Modelling


The birth of neural network AI can be seen in the 1943 publication „A logical calculus
of the ideas immanent in nervous activity“[MP43] by McCulloch and Pitts. In their
description, neurons can have either the output value 0 for inactivity or 1 for activity.
This is referred to as binary neural networks. (Sometimes the activity level -1 for
inactivity is also used, cf. [Ert16]). With McCulloch and Pitts, the input values for
a neuron are added together and this is set to activity level 1 if this sum exceeds a

37
38

predefined threshold θ. One therefore also speaks of „threshold elements“[Kru+15],


a representation of how this works, taking into account different weights w1 and w2 ,
can be seen in Figure ??. Several neurons are connected in layers to form networks in
order to be able to map complex relationships. As will be described in the following,
different weights (section ??), as well as activation and output functions (section 3.1.3)
with continuous (instead of binary) output values and different architectures play a
role in more complex neural networks. With the mathematical modelling of the neuron,
however, McCulloch and Pitts laid the foundation for the further development of
neural networks, although the computer technology of the time was not yet sufficiently
powerful for the simulation of simple brain structures. [Ert16]

Figure 3.2.: Representation of a neuron as a simple threshold element

3.1.1. Weights
In order to be able to map different relationships between the activations of the
neurons, the information flows between the neurons are weighted differently. In
addition, constant input values can provide a neuron-specific bias, the weighting of
which may also need to be determined. [Moe18] In the representation of the network
as a directed graph, the weights are usually specified at the edges. Mathematically,
the representation as a matrix is common. An example of the graph representation
and the equivalent weight matrix can be seen in Figure ??. The weights are often
named by convention with first the index of the neuron receiving the signal as input
and then the index of the sending neuron.

Figure 3.3.: Graph with weighted edges and equivalent weight matrix

At the beginning, the weights are set to random values. Usually values between -1
and 1 are chosen, although the values resulting from the training can also lie outside
Chapter 3. Neural Networks 39

this interval. [Moe18] For certain(!) training methods, initialisation with zeros is also
conceivable. [KK07]

3.1.2. Bias

Bias is the neuron-specific bias that can be interpreted as an additional input with a
constant value. The bias value can be modelled both explicitly or as a weighting of an
additional constant input value and can strengthen or weaken the result of a neuron.
[Zie15]

3.1.3. Activation and Output Function

Which value a neuron passes on depends on the incoming signals, their weighting,
the activation function and the output function. From the incoming output signals
of the connected neurons (and possibly the feedback of the own output signal of the
considered neuron itself), the activation function f calculates the activity level of the
neuron, taking into account the weighting of the signals. [Kru+15] The output of the
neuron xi is calculated from the wij weighted incoming signals of the n neurons xj
with j = 1...n.

n
xi = f ( wij xj )
P
j=1

In the simplest case, the activation function is linear, then the signals are added
weighted and possibly scaled with a constant factor. More complex, non-linear
relationships can only be modelled using linear activation functions in multi-layer
networks.[KK07] Non-linear functions facilitate generalisation and adaptation to diverse
data and are therefore widely used. [Gup20]
The output value of the neuron is then determined from the activity level of the
neuron determined in this way, with the help of the output function. In most cases,
the identity is used as the output function [Bec18a], so that often only the activation
function is addressed and the output function is neglected.

Function ReLu

The Rectified Linear Unit function is considered the most commonly used activation
function. It is defined as R(z) = max(0, z), i.e. all negative values become zero and
all values above zero remain unchanged. However, this can also lead to problems, as
negative activations are ignored and neurons are deactivated. This is what the Leaky
ReLu function is designed to correct, using a low slope <1 in the negative range (see
figure 3.4). Both functions and their derivatives are monotonic. This is important so
that the activation cannot decrease with increasing input values. [Gup20; Sha18]
40

Figure 3.4.: ReLu function (left) and Leaky ReLu

Function Sigmoid
The sigmoid function transforms the input values into the range [0,1], see figure 3.5.
The function for this is S(z) = 1+e1−z . Due to the range of values, the function is
well suited for the representation of probabilities. It is always positive, so that the
output cannot have a weakening influence on the subsequent neurons. The function is
differentiable and monotonic, but its derivative is not monotonic. [Gup20; Sha18]

Figure 3.5.: Sigmoid function

Function Softmax
The softmax function is often described as a superposition of several sigmoid functions.
It is mainly used in the output layer to reflect the probabilities for the different
categories. The mathematical expression for the softmax function is [Gup20]:
zj
σ(z)j = PKe
ezk
k=1

Function tanh
The tangent hyperbolic function is a non-linear function in the value range from −1
to 1. It is similar to the sigmoid function with the major difference of point symmetry
at the origin. The function is defined as:
2
tanh(z) = 1+e−2z −1
Chapter 3. 41

The negative inputs are strongly negative and the zero inputs are close to the zero
point of the diagram. Negative outputs can occur, which can also negatively affect
the weighted sum of a neuron of the following layer. This function is particularly well
suited to the differentiation of two classes. The function is differentiable, monotonic
and the derivative is non-monotonic. [Gup20] The figure 3.6 shows the function tangent
hyperbolicus in the definition range from −6 to 6.

Figure 3.6.: Funktion Tangens Hyperbolicus

3.2. Training
The great strength of neural networks lies in their ability to learn. In neural networks,
the learning process consists of adapting the weights with the aim of outputting the
desired output value(s) in the output layer in the case of corresponding input signals
in the input layer. For this purpose, the weights are first initialised randomly and
then adjusted according to the learning rules.

3.2.1. Propagation
Propagation refers to the process in which information is sent via the input layer
into the network and passes through it to the output layer. If the information only
moves from the input to the output layer, it is referred to as a forward network or
forward propagation. At the beginning, the input is copied into the input layer and
then forwarded to the first hidden layer. What information this layer passes on to the
next depends on the weights and the activation functions. Finally, the output can be
read from the output layer. [Kri08]

3.2.2. Error
Error is the deviation between the actual network output and the desired network
output after the input data has been propagated through the entire neural network.
[Kri08]

3.2.3. Delta Rule


The basis of the learning process is the delta rule. For each input data point in the
learning phase, the difference between the actual and expected (correct) output is
determined. Then the weights are changed so that the error becomes smaller. To
determine the direction in which the weights must be changed, the derivative of the
42

error function is used (gradient descent). The strength of the adjustment is determined
by the learning rate. This process is repeated iteratively until the termination criterion,
e.g. a sufficiently small error, is met.[KK07]

∆wij = ε(xi − li )f ′ ( j wij xj )


P

This rule can only be applied directly to the weights between the output layer and
the layer before it. For the layers before it, the error must be propagated backwards
through the network, since the desired outputs for the hidden networks are not given.
[Kru+15]

3.2.4. Error Backpropagation


.
In the backpropagation procedure, the change in the values of the hidden neurons is
recursively calculated from the change in the neurons in the higher layer. For this, the
network output must first be calculated (forward propagation). The error obtained is
used in backward propagation to adjust the weights from layer to layer. This process
is applied to all training data until the termination criterion is met. [Ert16]
This procedure is well established for training neural networks. Nevertheless, there are
some difficulties that need to be taken into account. For example, a local minimum of
the error can be found, which is not left anymore, so that the global minimum cannot
be found. This problem can be mitigated by running the procedure with different
starting values. [Kru+15]

3.2.5. Problem of Overfitting


If a neural network is repeatedly trained with known data sets, the network may learn
the data by heart instead of developing an abstract concept. develop. This condition
is called overfitting. For this reason, not all data should be used for training. It makes
sense to withhold a validation set and a test set. [Bec18c]

3.2.6. Epochs, Batches and Steps


An epoch is a complete run of all input data, with the measurement of the error, and
backpropagation to adjust the weights. For large data sets, the input data can be
divided into groups of equal size, called batches. In this way, a neural network can be
trained more quickly. It is important that the values within each batch are normally
distributed. For example, if a data set of 2,000 images is divided into batches of 10
images each, the epoch will run through 200 steps. Alternatively, the number of steps
can be specified and the batch size adjusted accordingly.[Bec18b]

3.2.7. Dropout
Dropout disables some neurons by multiplying a defined percentage of neurons by
zero during forward propagation. These neurons have no longer have any influence on
subsequent activations. Deactivation occurs on a random basis per run. If a neuron
is imprinted on a feature and fails, the surrounding neurons learn to deal with this
problem. This increases the generalisation of the network, it learns more abstract
concepts. A disadvantage is the increase in training time, because the parameter
feedback is noisy. At the end of the training, all neurons are reactivated. [Bec18b]
Chapter 3. 43

3.2.8. Correct Classification Rate and Loss Function


The correct classification rate measures the ability of the network to assign data points
to the correct class. This contrasts with the loss function. [Cho18]
Both are often documented during the training process to allow conclusions to be
drawn about the training process and to identify problems such as overfitting or too
few runs.

3.2.9. Training Data, Validation Data and Test Set


The training data is actively used to teach the network the features of the classifications.
The validation data is fed to the network at the end of each epoch to test how accurately
the network performs. Even if the validation set is not actively used for training, some
information flows back into the network. For this reason, a test set is used that is not
related to the training of the neural network and only has the purpose of checking the
theoretical accuracy of the network. [Bec18c]

3.2.10. Transfer Learning


Transfer learning means the transfer of the results of a finished neural network to a
new task. Layers can be kept constant and only the output layer can be retrained. or
several layers can be retrained on the basis of the current training status. If several or
all layers are retrained, the weights of the trained network are used as the starting
point. Which strategy should be used depends on two factors:

• similarity of the data

• size of the new data set

If the data sets are highly similar, transfer learning is a good way to improve a neural
network. Similarities exist, for example, in a trained network for the recognition of
dog breeds, which would also be suitable for the recognition of cat breeds. However,
this network would be unsuitable for the recognition of different car models, as there
would be too many deviations in the basic structures. As far as the size of the data
set is concerned, transfer learning may be particularly suitable for small data sets, as
otherwise overfitting would quickly occur. [Bec18c]

3.2.11. Weight Imprinting


Retraining all the weights in a model for transfer learning can be a time consuming
process. In actuality, it’s only the last layer of the network that decides the final
classification, so only retraining the last set of weights could achieve the desired
result. Weight imprinting is one such method developed by engineers at Google. Their
proposed imprinting method is to directly set the final layer weights for new classes
from the embeddings of training exemplars. Consider a single training sample x+
from a novel class, their method computes the embedding Φ(x+ ) and uses it to set
a new column in the weight matrix for the new class, i.e. wx = Φ(x+ ). Figure 1
illustrates this idea of extending the final layer weight matrix of a trained classifier by
imprinting additional columns for new categories. Intuitively, one can think of the
imprinting operation as remembering the semantic embeddings of low-shot examples
as the templates for new classes. Figure ?? illustrates the change of the decision
boundaries after a new weight column is imprinted. The underlying assumption is that
test examples from new classes are closer to the corresponding training examples, even
if only one or a few are observed, than to instances of other classes in the embedding
space.
44

Figure 3.7.: Illustration of the weight imprinting process [QBL18]

Figure 3.8.: Illustration of imprinting in the normalized embedding space. (a) Before
imprinting, the decision boundaries are determined by the trained weights.
(b) With imprinting, the embedding of an example (the yellow point) from
a novel class defines a new region. [QBL18]

Average Embedding
(i)
If n > 1 examples (x+ )ni=1 are available for a new class, new weights are computed by
(i)
averaging the normalized embeddings w̃x = n1 i=1 Φ(x+ )
Pn

and re-normalizing the resulting vector to unit length w+ = ||w̃ w̃+ || . In practice,
+

the averaging operation can also be applied to the embeddings computed from the
randomly augmented versions of the original low-shot training examples.

Fine-tuning
Since the model architecture has the same differentiable form as ordinary ConvNet
classifiers, a finetuning step can be applied after new weights are imprinted. The
average embedding strategy assumes that examples from each novel class have a
unimodal distribution in the embedding space. This may not hold for every novel class
since the learned embedding space could be biased towards features that are salient
and discriminative among base classes. However, fine-tuning (using backpropagation
to further optimize the network weights) should move the embedding space towards
having unimodal distribution for the new class. [QBL18]
Chapter 3. 45

3.3. Convolutional Neural Networks


Complex tasks such as pattern and object recognition with high-dimensional data
sets lead to convergence problems and high computation times with conventional
approaches. Such tasks can be solved with deep learning algorithms. To this class
belong CNN. They can solve tasks with less preprocessing than is usual for other
classification algorithms. Moreover, the filters used do not have to be designed by the
user, but are learned by the network itself during training. [Sah18]
According to [SJM18], one of the first successful implementations of CNNs was the
recognition of handwritten digits in 1998, which is described in the paper „Gradi-
ent-Based Learning Applied to Document Recognition“[LeC+98]. In the meantime,
CNNs have been able to prove themselves in many application areas. The typical
application for CNNs is processing data that has a known network topology. For
example, they can recognise objects in images or patterns in time series. [Nam17]
Besides image classification, object recognition and tracking, text, speech and gesture
recognition are also typical applications [Gu+18] describes. But CNNs are also used
in industry for defect detection. [LP20]
In general, CNNs are composed of the building blocks Convolutional Layer, Pooling
Layer and Fully Connected Layer, where each building block is usually used more
than once. These building blocks are described below. The primary assumption is the
use of CNNs for image classification.

3.3.1. Convolutional Layer


The main component of CNN are filters that are applied to the input data/images in
the Convolutional Layer. As input, the Convolutional Layer receives an n-dimensional
tensor, for example an image with three colour channels, and applies several kernels
to it. By convolution of tensor and kernel, the image is filtered. The results of the
convolution with the different filters are stacked so that the result is again a tensor.
[Mic19]
The operation of convolution between two matrices or tensors means the addition of
the products of the multiplication of the corresponding elements of both matrices or
tensors. Since the convolution in the Convolutional Layer usually takes place between
a tensor with many entries and a small kernel, the convolution must be carried out
separately for individual parts of the input tensor. Usually, one starts with the first
entries at the top left and, metaphorically speaking, shifts the kernel after the first
convolution relative to the input tensor by a certain number of columns and calculates
the convolution again. When one has reached the right edge, the kernel is shifted in
the row and the process is repeated. The value called stride indicates by how many
columns and rows the kernel is shifted relative to the input tensor. In addition, an
activation function is usually applied for each calculated pixel. [Mic19]
In this process, a tensor with dimensions h1 × w1 × d1 and kernels each with dimension
h2 × w2 × d1 , which are stacked to form dimensions h2 × w2 × d2 , becomes an output
tensor with dimensions h3 × w3 × d2 . [Aru18]
Here h3 and w3 depend on the height and width of the input tensor as well as the
filters and the chosen stride, while d2 is given by the number of filters.
If the size of the input tensor, the kernel and the stride do not match accordingly,
pixels at the edge of the image cannot be reached according to the procedure described
above. Therefore, sometimes additional pixels are added. This is called padding, or
zero-padding, because the added pixels are usually filled with the value zero. [Mic19]
The parameters learned in a CNN are the filters themselves. For k filters of dimension
m × n, considering one biasterm per filter, a total of k · m · n + k parameters must be
learned. It is worth noting that this number is independent of the size of the input
image, whereas in conventional feed-forward networks the number of weights depends
on the size of the input.[Mic19]
46

The convolutional layer does not necessarily have to be the first layer, but can
also receive preprocessed data as input. Usually, multiple Convolutional Layers are
used.[Mic19]
The filters of the first Convolutional Layer detect low-level features such as colours,
edges and curves, while higher layers detect more abstract features. By stacking
multiple Convolutional and Pooling Layers, higher level feature representations can be
progressively extracted. Moreover, the dimensionality is also reduced continuously,
which reduces the computational cost. [Gu+18; Sah18]

3.3.2. Pooling Layer


Usually, each Convolutional Layer is followed by a Pooling Layer. Sometimes both are
referred to together as one layer because no weights are learned in the pooling layer.
[Mic19]
It reduces the resolution of the previous feature maps by compressing them, thus
reducing complexity and hence computational cost. It also makes the network more
robust to small variations in the learned features and focuses on the most important
patterns. [Nam17]
There are different types of pooling. The most common are maximum and average
pooling. In maximum pooling, only the maximum value is stored from each section
of the tensor covered by the kernel. With average pooling, on the other hand, the
average value is determined for each section. [Sah18]
The pooling layer does not introduce any new learnable parameters, but it does
introduce two new hyperparameters, namely the dimension of the kernel and the stride.
[Mic19]

3.3.3. Fully Connected Layer


After several convolutional and pooling layers, the image is flattened into a single
vector. This output is fed into a neural feed-forward network (one neuron per entry
in the vector). This is fully interconnected and thus able to extract the non-linear
correlations from the extracted features and thus arrive at a classification.[Sah18]
The ReLu function is usually used as the activation function. In the output layer,
the softmax function is used to reflect the probabilities of each class to be assigned.
[Aru18]

3.3.4. Hyperparameter
When building and/or training a CNN, several parameters need to be considered.
The architecture of the network and the order of the layers need to be determined.
There are new approaches to different convolutional methods from which to choose.
In addition, the size and number of filters and the stride must be defined. The
pooling procedure must also be defined, as well as the activation functions in the fully
networked layers. Likewise, it must be defined how many neurons the input layer
should have (this influences the possible image size) and how many layers and neurons
per layer the Fully Connected Layer should be composed of.
Many considerations must be made regarding the database. The complexity of the
network to be trained depends mainly on whether only one input channel (black and
white or greyscale) is used or three (RGB). If only one colour channel is to be used
to reduce complexity, the images must be prepared accordingly. Likewise, the size
(number of pixels, height and width) of the images may need to be adjusted to fit the
architecture used. Normalising the images can also be helpful.
Since the images are presented to the neural network as data arrays, the file format
does not play a direct role. Indirectly, however, the different image qualities of the
various formats can influence the performance of the networks. For example, networks
Chapter 3. 47

trained with high-resolution images may produce poor results when they are asked
to classify compressed images. [DK16] Differences can also arise from the different
compressions, so different file formats should not be mixed.

3.3.5. Fine-tuning
Fine-tuning allows specific layers from the convolution layer to be trained specifically.
With a small number of training images, the mesh can be easily transferred to a new
task. This is much faster than designing a neural network from scratch. [Cho18]

3.3.6. Feature Extraction


In feature extraction, convolutional layers (convolutional layer and pooling layer) are
extracted from a trained CNN and a new classifier, i.e. a fully connected layer with
new classifications, is added. Two problems arise in this process:

• The representation only contains information about the probability of occurrence


of the class trained in the basic model.

• The fully connected layer contains no information about where the object is
located.

Should the position of an object in the image matter, the fully connected layers are
largely useless, as the later layers only generate abstract concepts. [Cho18]

3.3.7. AlexNet
The most popular CNNs for object detection and classification from image data are
AlexNet, GoogleNet and ResNet50. [SJM18]
Gu et al. [Gu+18] refer to the development of AlexNet in 2012 as a breakthrough in
large-scale image classification. Nevertheless, its architecture is only comlpex enough
to keep its operation comprehensible. Therefore, it seems a good choice for first
attempts at your own training.
AlexNet consists of five Convolutional Layers and three Fully Connected Layers. The
number and size of the filters as well as their stride are shown in table 3.1 for the
sake of clarity. The first and second convolutional layers are each followed by a
response normalisation layer, also known as batch normalisation. This additional
layer is intended to mitigate the effect of unstable gradients through normalisation.
After the response normalisation layers as well as after the fifth convolutional layer, a
max-pooling layer follows. Overlapping pooling was chosen as the pooling method.
This means that the individual sections to which pooling is applied overlap. The
pooling area measures 3x3 and the stride 2 pixels. ReLu is applied to the output
of each Convolutional Layer and each of the Fully Connected Layers. Each Fully
Connected Layer consists of 4096 neurons. The output layer uses the softmax function
to map the probabilities for the different categories via the output neurons. [KSH12a;
Ala08]

Convolutional Layer Anzahl Kernel Dimension Stride


1 96 11 × 11 × 3 4
2 256 5 × 5 × 48 1
3 384 3 × 3 × 256 1
4 384 3 × 3 × 192 1
5 256 3 × 3 × 192 1

Table 3.1.: Convolutional Layer Specifications


48

In total, AlexNet consists of 650000 neurons. About 60 million parameters need to be


trained. [KSH12a]
This puts it about in the middle compared to the number of learnable parameters in
other successful CNNs.
The input images for AlexNet need to be cropped to size 227 × 227. Three colour
channels are used (see table 3.1), so no conversion to greyscale is necessary, but one
usually normalises the data to mean 0 and standard deviation 1.[Ala08]

3.3.8. YOLO
YOLO (You Only Look Once) is a state-of-the-art real-time object detection system
that performs classification and localisation (with bounding box) much faster than
conventional systems.[Red21]
Detection (in the sense of classification AND localisation) of objects is more complex
than mere classification. In conventional systems, the image is divided into many
regions, each of which is classified to determine the regions that contain the detected
object. [Cha17]
In YOLO, on the other hand, the entire image is given once to a single neural network,
which decomposes the image itself into individual regions, determining bounding boxes
and classification probabilities for each of these regions themselves, and weighting the
bounding boxes with the probabilities.

Figure 3.9.: The YOLO system

The network architecture of YOLO is based on GoogLeNet. It consists of 24 convolu-


tional layers followed by two fully networked layers. The first 20 convolutional layers
are pre-trained with the ImageNet dataset over a period of one week. The images
in ImageNet have a resolution of 224 × 224, the final YOLO network takes images
with a resolution of 448 × 448 as input. Also, the training always takes place with
the entire images, without prior subdivision. [Red+15]
The advantage of YOLO lies primarily in its speed, which allows a video stream to be
processed with a latency of less than 25 ms. This makes YOLO 1000 × faster than
R-CNN and 100 × faster than Fast R-CNN. Since the entire image is presented to the
Chapter 3. 49

network and not just individual parts of it, contextual information can be taken into
account. For example, fewer errors due to interpretation of parts of the background as
objects occur less frequently than with other systems. In addition, what is learned
by the system can be better transferred to other areas, so that objects learned from
photographs, for example, can also be recognised better in artistic representations
than with other systems. [Red+15; Red21]
The YOLO code as well as the code used for the training are open source.

3.4. Images and Image Classification with a CNN


The classification of images is divided into two areas: image classification and object
classification. Object classification attempts to identify one or more objects. The
identification is done by assigning them to predefined classes. As a rule, the results are
given with probabilities. Image classification attempts to identify a suitable description
for the image; in this case, the image is recorded and interpreted in its entirety. [ZJ17]
Several challenges exist when classifying images. If one uses images, one is processing
larger amount of data. An image with 200×200 pixels has 40,000 pixels. If it is a colour
image with three colour channels, one processes 120,000 attributes. Furthermore, it is
desirable that the result of the classification is independent of affine mappings. That
is, the result is independent of scaling or rotation of the image. If only a section
of the image is examined, the corresponding result should still be delivered. The
mapping ?? represents an image with corresponding operations. The classic „Lena“
of image processing is used as the basis of the demonstration [Mun96]. This is an
image of Lena Soderberg from 1972 that is available in the USC-SIPI Image Database
[Web97].

Figure 3.10.: Original image, scaling, rotation and cropping of the image
50

1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 0
1 0
1 01 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1
0 1
0 01 01 1 1 1
1 1 1 1 1 1 01 1 1 1
1 1 1 1 1 1 01 1 1 1
10 × 10 Pixel
1 1 1 1
0 1
0 01 01 1 1 1
1 1 1 1 1 1 1 1 1 1

Figure 3.11.: Structure of a black and white image with 10 × 10 pixels

Resolution = width × height


VGA = 640 × 480
HD = 1280 × 720
Full HD = 1920 × 1080
4K = 3840 × 2160

Table 3.2.: Standard resolutions of camera images

3.4.1. Representation of Images


For a computer programme, an image is a matrix of numbers. The type of matrix
is determined by the resolution, the number of colours and the colour depth. The
resolution is the pair of values of how many elements, called pixels, the matrix has in
height and in width. A simple example is shown in the image ??. The image has a
resolution of 10 pixels in width and in height. The values can be either 0 or 1; in this
illustration the value 0 represents the colour black, 0 the colour white.
Typical resolutions for camera images are recorded in the table??. A pixel is defined
by the number of colours and the colour depth. The colour depth indicates the range of
values taken from the individual elements. These are usually integer values, specified
in bits. Thus, an 8-bit value corresponds to a number between 0 and 255. A pixel
can now contain such a numerical value for each colour. Often, however, a separate
matrix is used for each colour. If one works with the RGB representation, one has the
colours red, green and blue. The colour impression is obtained by superimposing the
three images. The situation is shown in the figure ??.

Often greyscales are used instead of colour images. For this purpose, 256 greyscales
are often used.When converting a colour image with three colours, the three matrices
are added weighted. It has been shown that equal weighting of the colour channels is

+ + =

Figure 3.12.: Image with 3 colour channels


Chapter 3. 51

inappropriate. The OpenCV library for processing images uses the following weighting
[Tea20a]:

0.299 · Red + 0.587 · Green + 0.114 · Blue


The conversion can be applied to the image „Lena “; this is in the figure ??. Other
weights are used depending on the application. The above weights are empirically
determined and are proportional to the sensitivity of the human eye to each of the
trichromatic colours. The image ?? has a colour depth of 8-bit. Other colour depths
can also be used. A black and white image has the lowest colour depth.

Figure 3.13.: Greyscale with a value range from 0 to 255

Figure 3.14.: Black and white image, image with 4 grey levels and with 256 grey levels.

However, the quality of the image does not only depend on the parameters mentioned.
The sharpness of the image also contributes to it. However, there is unfortunately
no simple definition here, but is often shaped by subjective perception. On the other
hand, it is possible to influence image sharpness by mathematical operations. Blurring
of an image can be achieved by blurring. Each pixel value is thereby represented by
the average of the pixel values in a small window centred on the pixel.
52

Figure 3.15.: Softened image with a 5 × 5 filter

Figure 3.16.: Edge detection on an image

The resolution of the image affects the amount of computation required. For an image
with a resolution of 674 × 1200 and a window size of 101 × 101, the new pixel value
must be calculated separately for all three colour channels. In total this results in

674 · 1200 · 101 ≈ 8.2 · 109


operations. An example is shown in the figure ?? with a filter size 5 × 5.
Another possibility of image processing is edge detection. With edge detection, the
goal is to identify regions of the image where intensities of colour differ greatly. This is
used to delineate objects and for object recognition. An example is given in Figure ??.

3.5. Convolution Neural Network


One focus for machine learning is the interpretation of images. Two cases must be
distinguished here

1. Object detection
2. Image classification.

.
When detecting objects, an image is examined to see if one or more objects can be
identified. If an object is detected, the coordinates of the surrounding rectangle and
the class or classes of the object are returned with a respective probability. With
the segmentation, instead of the coordinates of the surrounding rectangle, a polygon
course is returned that surrounds the object. In contrast, when classifying an image,
a description of the image is searched for. In both cases CNN are commonly used and
are described in the following section.
A CNN expects an image as input information. If images do not contain colour
information, they are represented as matrices whose number of columns and rows
correspond to the number of pixels in width and height. In the case that an image
Chapter 3.
PROC. OF THE IEEE, NOVEMBER 1998 7
53
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections


Convolutions Subsampling Convolutions Subsampling Full connection

Fig.Figure
2. Architecture of LeNet-5, a Convolutional Neural Network, here for digits recognition. Each plane is a feature et.al.
map, i.e. a [LeC+98]
set of units
whose weights3.17.: Struktur
are constrained der
to be identical. CNN-Architektur gemäß LeCun

as the feature maps in the previous layer. The trainable B. LeNet-5 Acti-
coecient and bias control the e ect of the sigmoid non-
Feature Kernel
linearity. IfLayer
the coecient is small, then the unit operates This
Sizesection describes in more detail the architecture of vation
Stride
in a quasi-linear mode, and the sub-samplingMap layer merely LeNet-5, the Convolutional SizeNeural Network used in the
blurs the input. If the coecient is large, sub-sampling experiments. LeNet-5 comprises 7 layers, not counting thefunction
performing a \noisy OR" or1a \noisy input,
units can be seen as Image
Input
all of which contain trainable parameters (weights).
32 × 32 - - -
AND" function depending on the value of the bias. Succes- The input is a 32x32 pixel image. This is signicantly larger
sive1layers of convolutions
Convolution and sub-sampling are6typically than
28 the
× largest
28 character
5 ×in 5
the database (at1most 20x20 tanh
alternated, resulting in a \bi-pyramid": at each layer, the pixels centered in a 28x28 eld). The reason is that it is
number Average
of feature maps is increased as the spatial desirable that potential distinctive features such as stroke
tion2is decreased. Each 6 resolu-
unit in the third hidden layer 14 × 14
in g- end-points or corner can2 appear
× 2 in the center2of the recep- tanh
Pooling tive eld of the highest-level feature detectors. In LeNet-5
ure 2 may have input connections from several feature maps
3 previous Convolution
in the layer. The convolution/sub-sampling 16 com- the10 set× of centers
10 of the5receptive
× 5 elds of the1last convolu- tanh
bination, inspired by Hubel and Wiesel's notions of \sim- tional layer (C3, see below) form a 20x20 area in the center
ple"4and \complex"Average
cells, was implemented in Fukushima's
16
of the 32x32 input. The values of the input pixels are nor-
5×5 2×2 2 tanh
Pooling
Neocognitron 32], though no globally supervised learning malized so that the background level (white) corresponds
procedure such as back-propagation was available then. A to a value of -0.1 and the foreground (black) corresponds
large5 degree of invariance
Convolution 120 of to 1.175.
to geometric transformations 1 × This 1 makes the 5× 5 input roughly
mean 1 0, and the tanh
the input can be achieved with this progressive reduction variance roughly 1 which accelerates learning 46].
Neural In the
of spatial
6 resolution compensated by a progressive- increase sampling 84following, convolutional layers are labeled
- Cx,layers
layers are labeled- Sx, and fully-connected
sub- tanh
Network
of the richness of the representation (the number of feature
are labeled Fx, where x is the layer index.
maps).
Neural
Since all the weights are learned with back-propagation, Layer C1 is a convolutional layer with 6 feature maps.
Ausgabe
convolutional networks -
can be seen as synthesizing their Each unit 10 in each feature map - is connected to-a 5x5 neigh- softmax
Network
own feature extractor. The weight sharing technique has borhood in the input. The size of the feature maps is 28x28
the interesting side e ect of reducing the number of free which prevents connection from the input from falling o
parameters, thereby reducing the \capacity"
Table the ma- the boundary.
3.3.:of Structural elements C1 contains
of LeNet156 trainable parameters, and
chine and reducing the gap between test error and training 122,304 connections.
error 34]. The network in gure 2 contains 340,908 con- Layer S2 is a sub-sampling layer with 6 feature maps of
nections, but only 60,000 trainable free parameters because size 14x14. Each unit in each feature map is connected to a
of the weight sharing. 2x2 neighborhood in the corresponding feature map in C1.
contains colour
Fixed-size information,
Convolutional Networks have suchbeenaapplied
matrix Theisfourcreated
inputs to afor
unit each
in S2 arecolour.
added, thenInmultiplied
this case, one
to manyofapplications, among otheror handwriting recogni- by a trainable coecient, and added to a trainable bias.
speaks colour channel simply
tion 35], 36], machine-printed character recognition 37], of channelindexchannel.
The result is passed through a sigmoidal function. The
CNN on-lineis handwriting
a commonly recognitionused38], and face recogni- 2x2 method
shift-invariant receptive eldsforare non-overlapping,
extracting therefore
adaptive feature features.
tion 39]. Fixed-size convolutional networks that share maps in S2 have half the number of rows and column as
Its weights
success alonghas contributed
a single temporal dimension to arethe popularity
known as feature mapsof inmachine
C1. Layer S2learning. Inparameters
has 12 trainable its history, its
Time-Delay Neuralhave Networks (TDNNs). TDNNs and 5,880 connections.
have beenevolution.
architectures undergone a
used in phoneme recognition (without sub-sampling) 40],great Some well-known architectures
Layer C3 is a convolutional layer with 16 feature maps. are
now41],described.
spoken word recognition (with sub-sampling) 42], Each unit in each feature map is connected to several 5x5
43], on-line recognition of isolated handwritten charac- neighborhoods at identical locations in a subset of S2's
ters 44], and signature verication 45]. feature maps. Table I shows the set of S2 feature maps

3.5.1. CNN Architectures


3.5.2. LeNet
In 1998, the CNN architecture LeNet was the first CNN architecture to use back-prop-
agation for practical applications. It thus transported Deep Learning from theory to
practice. LeNet was used for handwritten digit recognition and was able to outperform
all other existing methods. The architecture was very simple with only 5 layers
consisting of 5 × 5 convolutions and 2 × 2 max-pooling, but paved the way for better
and more complex models. [LeC+98] The listing of elements in Table 3.17 contains
several points that will be described later.

3.5.3. AlexNet
The AlexNet architecture was the first to use CNN models implemented on Graphics
Processing Unit (GPU)s. In 2012, this really connected the growing computing power
with Deep Learning. AlexNet is significantly deeper and more complex than LeNet
They also started using ReLU activations instead of sigmoid or tanh, which helped train
better models. AlexNet not only won the 2012 ImageNet classification competition,
54

but beat the runner-up by a margin that suddenly made non-neural models almost
obsolete. [KSH12a]

3.5.4. InceptionNet
The architecture InceptionNet used the increased possibilities of the hardware. The
architecture was again much deeper and richer in parameters than the existing models.
To deal with the problem of training deeper models, they used the idea of multiple
auxiliary classifiers present between the model to prevent the gradient from breaking.
One of their ideas was to use kernels of different sizes in parallel, thus increasing the
width of the model instead of the depth. This allows such an architecture to extract
both larger and smaller features simultaneously. [Sze+14]

3.5.5. VGG
Many architectures tried to achieve better results by using larger convolution kernels.
For example, the architecture AlexNet uses cores of size 11 × 11, among others. The
architecture VGG took the path of using several cores of size 3 × 3 in succession and
more non-linearities in the form of activation functions. This again improved the
results. [Zha+15; SZ15]

3.5.6. Residual Neural Network (ResNet)


The deeper the architectures became, the more often the problem occurred during
training that the amount of the gradient became too small. The backpropagation
then aborted and the parameters could not be determined. In the Residual Neural
Network (ResNet) architecture, to prevent the problem, residual or shortcut links were
introduced to create alternative paths for the gradient to skip the middle layers and
directly reach the initial layers. In this way, the authors were able to train extremely
deep models that previously did not perform well. It is now common practice to have
residual connections in modern CNN architectures. [He+16]

3.5.7. MobileNet & MobileNetV2


The MobileNet architecture is designed for edge computers and smart phones that
are limited in memory and computing power. This has opened up a large field of
new applications. This is in contrast to the previous strategy of increasing the size
of architectures by introducing separable convolutions. In simple terms, a two-di-
mensional convolutional kernel is split into two separate convolutions, namely the
depth convolution, which is responsible for collecting the spatial information for each
individual channel, and the point convolution, which is responsible for the interaction
between the different channels. Later, MobileNetV2 was also introduced with residual
connections and other optimisations in the architecture to further reduce the model
size. [How+; San+18]

3.5.8. EfficientNet
The architecture is the result of considering that the various architectures to date, focus
on either performance or computational efficiency. The authors of [TL20] claimed that
these two problems can be solved by similar architectures. They proposed a common
CNN skeleton architecture and three parameters, namely the width, the depth and
the resolution. The width of the model refers to the number of channels present in
the different layers, the depth refers to the number of layers in the model and the
resolution refers to the size of the input image to the model. They claimed that by
keeping all these parameters small, one can create a competitive yet computationally
Chapter 3. 55
Model Size Accuracy Parameter Depth
VGG16 528 MB 71,3% 138.357.544 23
VGG19 549 MB 71,3% 143.667.240 26
ResNet-50 98 MB 74,9% 25.636.712 50
ResNet-101 171 MB 76,4% 44.707.176 101
ResNet-152 232 MB 76,6% 60.419.944 152
InceptionV3 92 MB 77,9% 23,851,784 159
InceptionResNetV2 215 MB 80,3% 55,873,736 572
MobileNet 16 MB 70,4% 4.253.864 88
MobileNetV2 14 MB 71,3% 3.538.984 88

Table 3.4.: Characteristics of different CNN architectures

efficient CNN model. On the other hand, by increasing the value of these parameters,
one can create a model that is designed for accuracy.

3.5.9. General Structure


In the description of the architectures, some elements necessary for the construction
were mentioned. All building blocks can be listed as follows:

• Convolutional layer (convolution)

• Activation function

• Pooling

• Flattening

• Neural network

Each component is characterised by further variables. The combination and configu-


ration of the building blocks are summarised in the so-called hyperparameters. By
specifying the hyperparameter, the architecture and the number of parameters is
determined. For example, in a neural network, the hyperparameters are the number
of layers, the number of neurons per layer and the determination of the activation
functions. If the number of all neurons is now determined, this results in the number
of parameters that are determined during the training.
In the table 3.4 some architectures are compared. The table indicates the number of
parameters to be trained. If an image is classified, the models give different possibilities
and assign a probability to them. The accuracy now indicates the percentage with
which the image matches the prediction with the highest probability. For this purpose,
the models were trained and validated with the ImageNet.
The individual building blocks are now described below.

3.5.10. Input and Output


When a model receives an image as input, fields with pixel values are transferred. The
height, the width and the number of colour channels determine the size of the fields.
The elements of the fields then contain values between zero and 255. This value is the
intensity of the pixel. The model then determines probabilities of how to classify an
image based on these fields. For example, the result could be that 81% of the image is
a turtle, 11% is a gun and 8% is a chocolate bar.
56
   
a11 a12 a13 c11 c12 c13 X3 X 3
A ∗ C = a21 a22 a23  ∗ c21 c22 c23  = aij · cij
a31 a32 a33 c31 c32 c33 i=1 j=1

Figure 3.18.: Definition of the convolution of two matrices

3.5.11. Convolution Layer


In a convolution layer, the input data is modified by one or more filters. In image
processing, this is the application of a defined set of filters to two-dimensional data.
The size of this filter is defined by the so-called kernel: a 3 × 3 kernel is represented
by a 3 × 3 matrix. In order to be able to apply the filter, a range of pixel values is
defined that has the same size; the pixel values are then also stored in a matrix. By
convolution of the matrix, as shown in the equation ??, a value is assigned to the
matrix and the filter.
To apply a filter, matrices must therefore be systematically selected from the pixel
values. Since the dimension of a kernel is defined by an odd number 2n + 1, a
surrounding matrix can be assigned to each pixel value that has a distance greater
than n from the edge. In the figure 3.19 such a situation is shown. The filter, labelled
C and coloured purple, has a dimension of 3 × 3. In the matrix A, the field of pixel
values, a region is coloured red. In the middle is the pixel from the fifth column and
the second row. The filter is now applied to this area of the matrix A. The result is
then entered into the matrix A ∗ C. In this case the value is 4 and marked green.
A filter is now moved step by step over the data. The filter can be applied to any
pixel value that is far enough from the edge, or individual pixels can be skipped. This
is defined by the step size, also called stride. With a step size of one, every pixel is
used. To reduce the output size, stride sizes larger than one can be used; it is also
possible to set different stride sizes for width and height. If a step size of 2 is specified
in both directions, the amount of data has been reduced to 25%. [DV16]
In the figure 3.19, the input matrix is a 7 × 7 matrix. Since a step size of one is applied
here, the result is a 5 × 5 matrix.
Often an activation function is applied to each result.

0 1 1 1×1 0×0 0×1 0


0 0 1 1×0 1×1 0×0 0 1 4 3 4 1
0 0 0 1×1 1×0 1×1 0 1 0 1 1 2 4 3 3
0 0 0 1 1 0 0 ∗ 0 1 0 = 1 2 3 4 1
0 0 1 1 0 0 0 1 0 1 1 3 3 1 1
0 1 1 0 0 0 0 3 3 1 1 0
1 1 0 0 0 0 0
A ∗ C = A∗C

Figure 3.19.: Convolution layer with 3 × 3 kernel and stride (1, 1)

When defining the filters, make sure that the filter is defined for each channel. To
capture the features in an image, the values in the filter must match the structure of
the original pixel values of the image. Basically, if there is a shape in the input image
that generally resembles the curve that this filter represents, then the convolution will
result in a large value.
As a rule, several filters are used in parallel. The result is so-called feature maps.
Further convolution layers are then applied to this result, which in turn generate
feature maps.
Chapter 3. 57

Figure 3.20.: Beispielanwendung für einen Filter und ein Ausschnitt

Feature Identification
A convolution is defined by filters. Each filter can be interpreted as an identifier of
features. Features are, for example, edges, colours or even curves. For example, the
7 × 7 filter is considered:

0 0 0 0 0 50 0
 
 0 0 0 0 50 0 0 
 0 0 0 50 0 0 0 
 
 0 0 0 50 0 0 0 
 
 0 0 0 50 0 0 0 
 
 0 0 0 50 0 0 0 
 

0 0 0 0 0 0 0
This filter can also be visualised as images:
To demonstrate the effect of the filter, a sample image 3.20 is considered:
In the example image 3.20 a section is marked to which the filter is applied.

0 0 0 0 0 0 30 0 0 0 0 0 50 0
   
 0 0 0 0 80 80 80   0 0 0 0 50 0 0 
 0 0 0 30 80 0 0   0 0 0 50 0 0 0 
   
 0 0 0 80 80 0 0  ⋆  0 0 0 50 0 0 0 
   
 0 0 0 80 80 0 0   0 0 0 50 0 0 0 
   
 0 0 0 80 80 0 0   0 0 0 50 0 0 0 
   

0 0 0 80 80 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 30
 
 0 0 0 0 80 80 80 
 0 0 0 30 80 0 0 
 

=  0 0 0 80 80 0 0 
 

 0 0 0 80 80 0 0 
 0 0 0 80 80 0 0 
 

0 0 0 80 80 0 0
The basic property of filters is clarified. If there is a shape in the input image that
generally resembles the filter, then large values result. A probability is then assigned via
the activation function. Various filters are then defined and applied in the convolution
layer. The result is then the feature map. The figure 3.21 shows some examples of
concrete visualisations of filters of the first convolutional layer of a trained network.

Edge Detection
Suitable filters can also be defined for edge detection. The filter

1 0
 
−1
 1 0 −1 
1 0 −1
teristics of finger vein biometrics. However, the differences vein recognition. Specifically, the CNN fi
between images from ImageNet and finger vein images lie not convolutional layer of the AlexNet networ
only in the images themselves, but also in the image space. these filters with large kernel size well cap
Consequently, developing a method for exploiting effective features. By visualizing the CNN and Gab
features using a CNN model pre-trained on ImageNet is not outputs, five labels are given to the CNN fi
58 straightforward. their appearances and filtered images. App
the CNN filters that can capture line struc
establish a new group of CNN filters. Simil
approach, the competitive order (CO) map
using the winner-take-all rule on the new
tered images. Then the pyramid histograms
order are extracted and concatenated to
histogram for an image representation.
This paper makes three important contri
(1) A framework that employs a CNN mo
ImageNet for finger vein recognition is p
from the existing CNN-based technologies
FIGURE 1. Gabor filters with four groups of parameters on eight
labeled dataset to train a deep model, the
orientations. is more readily usable in fields where
Figure 3.21.: Visualisation of filters of a trained network [LXW19; Sha+16]
datasets are scarce and tend to be small, s
Finger veins located underneath the skin of a finger contain recognition. In addition, only the CNN fi
10 10 10 0 0 network
0
 
a vascular that resembles a piecewise linear pattern, layers are used. (2) We define five labels
 10 10 10 which0 are 0 similar
0  tothe principle  lines in a0 palm 30 print.
30 0 according to their appearances and outp
 
 10 10 10 For biometrics
0 0 0  involving1 abundant
0 −1 line structures,
 0 30orienta- 30 0 
 
 ∗  1approaches method confirms that CNN filters with dif
tion
 10 10 10 0 0 0  coding (OC)-based 0 −1 using= the 0Radon trans-
30 30 0
 
various degrees of effectiveness for featu
1 filters
0 −1
 
 10 10 10 form [16], [17] and tunable such as steerable filter [18]
0 0 0  0 30 30 0 (3) The proposed CNN-CO can perform b

and Gabor filter [3], [19]–[21], are appropriate for capturing of-the-art algorithms, as demonstrated in e
10 10 10 0 0 0
the orientation information. OC-based representation is nor- The remainder of this paper is orga

10 10 10 mally0 obtained
0 0 by first convolving an image with
 a bank of Section 2 introduces the related work on
 10 10 10 filters and then encoding
 the filtered responses using several
0 0 0  0 0 0 0 nition technology. A comparison of the v
 
1 1 1 achieve

 10 10 10 rules. Since OC-based approaches can  guaranteed
0 0 0   0 recog-
0 0 0 Gabor filters and CNN filters is presen

 0applied 0 0 =

accuracy, they  ∗ been
have widely for
 palmprint
 10 10 10 0 0 0   0 0 0 0 Section 4 describes our proposed approac
 

nition and finger  vein recognition.
−1 −1 Figs.
−1 1 and 2 depict the provides extensive experimental results an
 10 10 10 0 0 0 
Gabor filters and their filtered images. In Fig. 3(a),0the0visu- 0 0

Section 6 provides our conclusion and
10 10 10 0 0of 0convolutional filters from the first layer of the
alization future work.
AlexNet [4] network indicates that some CNN filters from
Figurethe3.22.:
first layer and their
Detection of outputs
verticalareandsimilar to Gaboredges
horizontal filters. II. RELATED WORK
This similarity motivates us to select CNN filters from a The last decade has witnessed extensiv
shallow layer to extract orientation features for finger vein automated finger vein recognition techn
recognition. biometric techniques, finger vein recogniti
can be used for a vertical edge detection, whereas the filter
In this paper, we propose to employ the learnable convo- main steps: image collection, image prep
lutional filters from a CNN model  pre-trained on ImageNet extraction, and matching. Many notable tec
to extract the CNN1competitive 1 1order (CNN-CO) for finger proposed for finger vein recognition.
 0 0 0 
35114 −1 −1 −1
can be used for a horizontal one. Both filters are now applied to a vertical edge.
Edges can be present in any arrangement in images, edge detection must take this into
account. That is, if a CNN is trained, the filters evolve based on the given images.

3.5.12. Padding
In the previous representation, filters could not be applied to boundary values. On
the one hand, this means that information is not used, on the other hand, the size of
the images is reduced each time the folding is applied. To prevent this, padding is
applied. For a filter of dimension (2n + 1) × (2m + 1), the input image is padded with
an additional border of size n and m. The additional values are each zero occupied.

3.5.13. Activation function


The activation function is applied to the respective output of a layer and determines
how the output value should be interpreted by applying a defined function. Here,
the respective input values, as well as a bias, are included in the calculation and
determine whether the neuron is triggered [Nwa+18]. In this example, the Rectified
Linear Unit (ReLu) function is used for this: for input values less than or equal to
Chapter 3. 59
y

5
4
3
2
1
x
−5 −4 −3 −2 −1 1 2 3 4 5

Figure 3.23.: Representation of the ReLu activation function

ReLU(a11 + b) ReLU(a12 ReLU(a1m + b)


 
...
 ReLU(a21 + b) ReLU(a22 ... ReLU(a2m + b) 
ReLU(A) =  .. .. ..
 
. . .

 
ReLU(an1 + b) ReLU(an2 ... ReLU(anm + b)

Figure 3.24.: Application of an activation function with a bias b ∈ R to a matrix

zero, the function returns zero; for positive values, it behaves linearly [Aga18b]. Thus,
only positive values are returned (Abb. 3.23).
For example, an activation function can be applied after a convolution. This is shown
in the equation ??.

3.5.14. Pooling
After the use of convolutional layers, pooling layers are often used in CNN to reduce
the size of the representation, to also speed up the calculations as well as to make some
of the detection functions a bit more robust. A distinction is usually made between
two pooling methods.

• Max Pooling

• Average Pooling

In pooling, areas of a matrix are combined. The ranges are defined by two values n
and m. The ranges are submatrices of the matrix of size n × m. A function is then
applied to each submatrix.
In max-pooling, the maximum value of the submatrix is determined and entered as
the result. In the case of average pooling, the average value of the matrix elements is
determined instead of the maximum value and used further.
In the case of pooling, a window with a defined size is moved over the input matrix
and the corresponding value within the window is taken over as a pooled value. The
step size with which the window moves over the matrix is defined via a parameter
[DV16]. The default value corresponds to the size of the sub-matrix. An example
is shown in the representation 3.25: a 4 × 4-matrix is defined with size pool_size
2 × 2 and a step size stride_size 2 × 2 processed. Since two values are combined
into each dimension, the dimensions of the output matrix are halved in contrast to
the input matrix. [Bri15]
60

2 0 1 1
0 1 0 0 pool_size = (2,2) 2 1
0 0 1 0 stride = (2,2) 3 1
0 3 0 0

2 0 1 1
0 1 0 0 pool_size = (2,2) 0,75 0,75
0 0 1 0 stride = (2,2) 0,50 0,25
0 3 0 0

Figure 3.25.: Application of max-pooling and average-pooling

3.5.15. Flattening
With a flatten layer, the multidimensional input object is converted into a vector.
For example, if an object with dimensions 5 × 7 × 7 is passed in, it results in a
one-dimensional vector with 245 elements. In the figure 3.26, the function Flatten is
applied to a 3 × 3 matrix. The function is used when a neural network is subsequently
used.

1 1 0
4 2 1 1 1 0 4 2 1 0 2 1
0 2 1

Figure 3.26.: Application of flattening

3.5.16. Dense Layer


In a layer, or fully-connected layer, is a layer that uses a neural network. This means
that each of the previous outputs are fed into each of the inputs of the layer. Similarly,
each output is linked to each of the subsequent inputs. This type of layer is usually used
to be able to classify using the feature maps generated by the convolution. [SIG20a]
It is determined by the number of neurons and the associated activation function.

3.5.17. Model used for a CIFAR 10 classification


The model used is an object keras.models.Sequential. With the help of the method
add, the individual layers can be added to this model. The structure of the model is
modelled on the Tensorflow tutorial for CNNs [Goo20b]. The structure of the model
is shown in Abb. 3.27.
The layers described previously are now used to create a model for classifying the
dataset Canadian Institute For Advanced Research (CIFAR)-10 with Keras. Here we
look at how these layers are applied in the Python environment and how the size and
dimension of the data changes between each layer. The dimensions of the input as
well as output data of the individual layers are mapped in Abb. 3.27.
At the beginning, as described in ??, a Keras object Sequential is created. The first
layer added to this model is a convolution layer Conv2D, see line 2. This is defined
with 32 filters, each with a size of 3 × 3. If, when creating the object, the parameter
stride is not explicitly specified when the object is created, the default step size (1,
1) is used. Since the input data into the model are RGB images with 32 × 32 pixels,
see Abschnitt 4.1.2, the parameter input_shape is taken to be 32 × 32 × 3. This
Chapter 3. 61

1 MODEL = models.Sequential()
2 MODEL.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation=’relu’,
,→ input_shape=(32, 32, 3)))
3 MODEL.add(layers.MaxPooling2D(pool_size=(2, 2)))
4 MODEL.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’))
5 MODEL.add(layers.MaxPooling2D(pool_size=(2, 2)))
6 MODEL.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation=’relu’))
7 MODEL.add(layers.Flatten())
8 MODEL.add(layers.Dense(units=64, activation=’relu’))
9 MODEL.add(layers.Dense(units=10))

Listing 3.1.: Building a CNN with Keras


src:cifarlayers

configuration transforms the input data in the first convolutional layer from 32×32×3
to an output of 30 × 30 × 32. A separate feature map is stored for each of the applied
filters.
The next layer, a max-pooling layer with a pool_size of 2 × 2, reduces the data size
from 30 × 30 × 32 to 15 × 15 × 32 (line 3).
input: [(?, 32, 32, 3)]
conv2d_input: InputLayer
output: [(?, 32, 32, 3)]

input: (?, 32, 32, 3)


conv2d: Conv2D
output: (?, 30, 30, 32)

input: (?, 30, 30, 32)


max_pooling2d: MaxPooling2D
output: (?, 15, 15, 32)

input: (?, 15, 15, 32)


conv2d_1: Conv2D
output: (?, 13, 13, 64)

input: (?, 13, 13, 64)


max_pooling2d_1: MaxPooling2D
output: (?, 6, 6, 64)

input: (?, 6, 6, 64)


conv2d_2: Conv2D
output: (?, 4, 4, 64)

input: (?, 4, 4, 64)


flatten: Flatten
output: (?, 1024)

input: (?, 1024)


dense: Dense
output: (?, 64)

input: (?, 64)


dense_1: Dense
output: (?, 10)

Figure 3.27.: Structure of the neural network used

This data is now fed again into a convolutional layer, which here is also configured
with a kernel_size of 2 × 2, ??, line 4, but with 64 filters. This transforms the data
from 15 × 32 × to 13 × 64 × .
To reduce the amount of data again, another max-pooling layer is applied, which is
also configured with a pool_size of 2 × 2 in line 5, reducing the amount of data
from 13 × 13 × 64 to 6 × 6 × 64.
The last convolutional layer is then applied to this network. This, like the previous
convolutional layer, works with kernel_size 2 × 2 and 64 filters as per line 6. This
transforms the data from 6 × 64 × to 4 × 64 × .
To enable classification of this information with a neural network, the so-called dense
layer, the results of the convolutional layers are transformed into a vector with a
flatten layer, see line 7. Due to the input size of 4 × 64 × this results in a vector of
62

length 1024.
The vector is fed into a Dense layer in line 8. This layer is parameterised with 64
units and the activation function ReLu, which combines the 1024 input data into 64
nodes and calculates the corresponding outputs.
In the last layer, the 64 output values are passed to another Dense layer, which has
been configured with 10 units. The input data are thus now reduced to 10 output
data, which correspond to the classes to be identified of the data set CIFAR-10.
4. Databases and Models for Machine
Learning
Data is the basis for machine learning. The success of the models depends on their
quality. This is because machine learning relies on accurate and reliable information in
the training of its algorithms. This self-evident fact is well known, but unfortunately
not sufficiently taken into account. Poor data leads to insufficient or incorrect results.
This goes without saying, but is often overlooked. The training data is realistic if
it reflects the data that the AI system records in real use. Unrealistic data sets
hinder machine learning and lead to expensive misinterpretations. If one wants to
develop software for drone cameras, realistic images must also be used. In such a
case, if one uses corresponding images from the web, they usually have the following
characteristics:

• The perspective is more like head height.

• The targeted property is located in the centre.

If one wants to use data sets for one’s own needs, care must be taken that only data
that are also realistic are used. The data sets must also not contain any outliers or
redundancies. When checking the quality of the data, the following questions can be
helpful:

• By what means and what technique was the data generated?

• Is the data source credible?

• With what intention was the data collected?

• Where does the data come from? Is it suitable for the intended application?

• How old is the data?

• In what environment/conditions was the data created?

If necessary, collect your own data or have it collected.


Anyone who does data science can measure their developed algorithms against the
results of others by using standardised data sets. Many databases and pre-trained
models are available on the internet for this purpose. This chapter describes some
of them. It should be noted that many data sets are available in different versions.
Depending on the provider, the data may already have been processed and prepared
for training. Here it is important to make sure that a suitable variant is used. Access
to different data sets is available on the following internet sites:

Kaggle.com: Hier werden über 20.000 Datensätze angeboten. Dazu ist nur ein kosten-
loses Benutzerkonto notwendig.

lionbridge.ai: The website offers a good overview of data sets from the public and
commercial sectors.

govdata.de: The Data Portal for Germany offers freely available data from all areas
of public administration in Germany.

63
64

# Construct a t f . data . Dataset


ds = t f d s . l o a d ( ’ mnist ’ , s p l i t= ’ t r a i n ’ , s h u f f l e _ f i l e s=True )

# B u i l d your i n p u t p i p e l i n e
ds = ds . s h u f f l e ( 1 0 2 4 ) . batch ( 3 2 ) . p r e f e t c h ( t f . data . e x p e r i m e n t a l .AUTOTUNE)
f o r example in ds . t a k e ( 1 ) :
image , l a b e l = example [ " image " ] , example [ " l a b e l " ]

Listing 4.1.: Loading a dataset with TensorFlow

American government database: The American government also operates a portal


where records of the administration are available.

scikit data sets: Data sets are also installed with the Python library. There are only
a few datasets, but they are already pre-worked so that they are easy to load
and use.

UCI - Center for Machine Learning and Intelligent System: The University of Irvine
in Califonia offers around 600 data sets for its own study.

TensorFlow: TensorFlow data sets: A collection provides ready-to-use datasets. All


datasets are made available via the structure alstf.data.Datasets or wastf.data.Datasets.
The datasets can also be retrieved individually via GitHub.

Open Science Data Cloud: The platform aims to create a way for everyone to access
high-quality data. Researchers can house and share their own scientific data,
access complementary public datasets, create and share customised virtual
machines with the tools they need to analyse their data.

Amazon: Amazon also provides data sets. You have to register for this free of charge.

KDnuggets.com: Similar to Kaggle, but links to other websites.

paperswithcode.com: The platform provides a possibility to exchange data sets. Here


you will also find many well-known datasets with their links.

Google Dataset Search: The website does not directly offer datasets, but rather
search support. Google restricts its search engine here to data sets.

faces: Labeled Faces in the Wild is a public benchmark for face verification, also
known as pair matching. No matter what the performance of an algorithm, it
should not be used to conclude that an algorithm is suitable for any commercial
purpose.

4.1. Datenbanken
4.1.1. Data Set MNIST
The dataset MNIST is available for free use and contains 70,000 images of handwritten
digits with the corresponding correct classification. [Den+09; Den12] The name of the
dataset is justified by its origin, as it is a mmodified dataset from two datasets from
the US National Institute of Standards and Technology. These contain handwritten
digits from 250 different people, consisting of US Census Bureau employees and high
school students. The NIST Special Database 3 (SD-3) and NIST Special Database 1
(SD-1) datasets collected in this way were merged because the former dataset of clerical
workers contains cleaner and more easily recognisable data. Initially, SD-3 was used
Chapter 4. Databases and Models for Machine Learning 65

Figure 4.1.: Examples from the training set of the dataset MNIST [SSS19]

as the training set and SD-1 as the test set, but because of the differences, it makes
more sense to merge the two. The dataset is already split into 60,000 training images
and 10,000 test images. [LCB13; Nie15]
The data is provided in four files, two for the training dataset and two for the test
dataset, one of which contains the image data and the other the associated labels:

• train-images-idx3-ubyte.gz: Training pictures (9912422 bytes)

• train-labels-idx1-ubyte.gz: Classification of the training data (28881 bytes)

• t10k-images-idx3-ubyte.gz: Test patterns (1648877 bytes)

• t10k-labels-idx1-ubyte.gz: Classification of the test patterns (4542 bytes)

The images in the dataset MNIST are greyscale images of size 28 × 28 pixels.[LCB13]
The data is in IDX file format and so cannot be opened and visualised by default.
However, you can write a programme to convert the data into CSV format or load a
variant in CSV format directly from other websites. [Red20]
The library TensorFlow provides the dataset MNIST under tensorflow_datasets.
This is not the only dataset provided by TensorFlow. A list of all datasets that can
be loaded via so can be found in catalogue of datasets.

4.1.2. CIFAR-10 and CIFAR-100


Data Set CIFAR-10
The data sets CIFAR-10 and CIFAR-100 were developed by Alex Krizhevsky and his
team and therefore named after the Canadian Institute f or Advanced Research.
The data set CIFAR-10 is a partial data set of the data set 80 Million Tiny Images.
This part consists of 60,000 images, divided into 50,000 training images and 10,000
test images, which are divided into 10 classes and labelled accordingly. The existing
classes represent aeroplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and
trucks. Thus, 6,000 images exist per class, with each image having a size of 32 × 32
pixels with three colour channels [KH09; KSH17]. An extract of the dataset can be
seen in Abb. 4.2.
66

frog truck truck deer automobile

automobile bird horse ship cat


Figure 4.2.: Extract from ten illustrations of the data set CIFAR-10

1 (TRAIN_IMAGES, TRAIN_LABELS), (TEST_IMAGES, TEST_LABELS) = datasets.cifar10.


,→ load_data()
2 TRAIN_IMAGES = TRAIN_IMAGES / 255.0
3 TEST_deer horse / 255.0
IMAGES = TEST_IMAGES horse bird truck

Listing 4.2.: Load and normalise the data set CIFAR-10

To import the dataset into the Python environment, the module keras.datasets is used
truck truck cat bird frog
here. This contains a direct interface to the dataset CIFAR-10, which is downloaded
to the local system by calling the function load_data(). In the background, the
dataset is downloaded as a packed file with the extension .tar.gz and stored in the
folder C:\Users\<username>\.keras\datasets. After unpacking, the dataset is
contained in several serialised objects, which are loaded into the Python environment
via pickle. The training data, training labels, test data and test labels can then be
accesseddeer cat lists (Abschnitt
via corresponding frog4.1.2). In orderfrogto reduce the vanishing
bird
gradient problem during the later training, the RGB colour values stored in the data
set with the value range from 9 to 255 are converted into values from 0 to 1. The
listing ?? shows the loading and normalisation of the data.

Data Set CIFAR-100

The data set CIFAR-100, on the other hand, also contains 60,000 images, but divided
into 100 categories with 600 images each. In addition, there are 20 upper categories,
to each of which five of the 100 categories are assigned; see table 18.15.
The dataset can also be downloaded and imported directly into TensorFlow via Keras
[Raj20]:

1 import tensorflow as tf
2 from tensorflow.keras import datasets
3
4 (TRAIN_IMAGES100, TRAIN_LABELS100), (TEST_IMAGES100, TEST_LABELS100) = tf.keras.
,→ datasets.cifar100.load_data()
5 TRAIN_IMAGES100 = TRAIN_IMAGES100 / 255.0
6 TEST_IMAGES100 = TEST_IMAGES100 / 255.0

Listing 4.3.: Load and normalise the data set CIFAR-100


Chapter 4. Databases and Models for Machine Learning 67
Superclass Classes
aquatic mammals beaver, dolphin, otter, seal, whale
fish aquarium fish, flatfish, ray, shark, trout
flowers orchids, poppies, roses, sunflowers, tulips
food containers bottles, bowls, cans, cups, plates
fruit and vegetables apples, mushrooms, oranges, pears, sweet peppers
household electrical devices clock, computer keyboard, lamp, telephone, television
household furniture bed, chair, couch, table, wardrobe
insects bee, beetle, butterfly, caterpillar, cockroach
large carnivores bear, leopard, lion, tiger, wolf
large man-made outdoor things bridge, castle, house, road, skyscraper
large natural outdoor scenes cloud, forest, mountain, plain, sea
large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals fox, porcupine, possum, raccoon, skunk
non-insect invertebrates crab, lobster, snail, spider, worm
people baby, boy, girl, man, woman
reptiles crocodile, dinosaur, lizard, snake, turtle
small mammals hamster, mouse, rabbit, shrew, squirrel
trees maple, oak, palm, pine, willow
vehicles 1 bicycle, bus, motorcycle, pickup truck, train
vehicles 2 lawn-mower, rocket, streetcar, tank, tractor

Figure 4.3.: Supercategories and categories in CIFAR-100 with the original designations

4.1.3. Fisher’s Iris Data Set


The recognition of the iris is another classic for image classification. There are also
several tutorials on this classic. The Fisher’s Iris Data Set was used in R.A. Fisher’s
classic 1936 work and can be found in the UCI Machine Learning Repository. [Fis;
Fis36a; SW16; UK21] Four features of the flowers Iris Setosa, Iris Versicolour and
Iris Virginica were measured. For each of the three classes 50 data sets with four
attributes are available. The width and length of the sepal and petal were measured
in centimetres. [And35; SP06]. One species of flower is linearly separable from the
other two, but the other two are not linearly separable from each other.
The data set can be accessed in several places:

• Archive of machine learning datasets from the University of California at Irvine.


[Fis36b]
• Python package skikitlearn [Ped+11]
• Kaggle - Machine Learning Website [Kagb]

The data set is available on various websites, but attention must be paid to the structure
of the data set. Since it is a file in CSV format, the structure is column-oriented. As a
rule, the title of the individual column is given in the first line: sepal_length, sepal_-
width, petal_length, petal_width and species. The values are in centimetres,
the species is given as setosa for Iris setosa, versicolor for Iris versicolor and
virginica for the species Iris virginica. The columns in this record are:

• Sequence number
• Length of sepal in centimetres
• Sepal width in centimetres
• Petal length in centimetres
68

1 from sklearn.datasets import load_iris


2 iris = load_iris()

Listing 4.4.: Loading the Fisher’s Iris Data Set

• Petal width in centimetres


• Class

The aim is to classify the three different iris species based on the length and width
of the sepal and petal. Since the dataset is delivered with the library sklearn, this
approach is chosen.
The record is a dictionary. Its keys can be easily displayed:
1 >>> iris.keys()

The related issue is as follows:


dict \ _keys ( [ ’ data ’ , ’ t a r g e t ’ , ’ frame ’ , ’ t a r g e t \_names ’ , ’DESCR ’ , ’ f e a t u r e \_na
The individual elements can now be viewed. After entering the command
1 iris[’DESCR’]

a detailed description is output:


With the input
1 iris[’feature_names’]

you get the names of the attributes:


[ ’ s e p a l ␣ l e n g t h ␣ (cm) ’ ,
’ s e p a l ␣ width ␣ (cm) ’ ,
’ p e t a l ␣ l e n g t h ␣ (cm) ’ ,
’ p e t a l ␣ width ␣ (cm) ’ ]
The names of the flowers that appear after entering
1 iris[’target_names’]

are displayed, are


a r r a y ( [ ’ s e t o s a ’ , ’ v e r s i c o l o r ’ , ’ v i r g i n i c a ’ ] , dtype= ’<U10 ’ )
For further investigation, the data with the headings are included in a data frame.
1 X = pd.DataFrame(data = iris.data, columns = iris.feature_names)
2 print(X.head())

The command (X.head()) shows – as in Figure 18.5 – the head of the data frame. It
is obvious that each data set consists of four values.

Figure 4.4.: Header lines of Fisher’s Iris Data Set

Each record also already contains its classification in the key target. In the figure 18.6
this is listed for the first data sets.
Chapter 4. Databases and Models for Machine Learning 69

1 ’.. _iris_dataset:
2
3 Iris plants dataset
4 --------------------
5
6 **Data Set Characteristics:**
7
8 :Number of Instances: 150 (50 in each of three classes)
9 :Number of Attributes: 4 numeric, predictive attributes and the class
10 :Attribute Information:
11 - sepal length in cm
12 - sepal width in cm
13 - petal length in cm
14 - petal width in cm
15 - class:
16 - Iris-Setosa
17 - Iris-Versicolour
18 - Iris-Virginica
19
20 :Summary Statistics:
21
22 ============== ==== ==== ======= ===== ====================
23 Min Max Mean SD Class Correlation
24 ============== ==== ==== ======= ===== ====================
25 epal length: 4.3 7.9 5.84 0.83 0.7826
26 sepal width: 2.0 4.4 3.05 0.43 -0.4194
27 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
28 petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
29 ============== ==== ==== ======= ===== ====================
30
31 :Missing Attribute Values: None
32 :Class Distribution: 33.3% for each of 3 classes.
33 :Creator: R.A. Fisher
34 :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
35 :Date: July, 1988
36
37 The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
38 from Fisher\’s paper. Note that it\’s the same as in R, but not as in the UCI\
,→ nMachine Learning Repository, which has two wrong data points.
39
40 This is perhaps the best known database to be found in the
41 pattern recognition literature. Fisher\’s paper is a classic in the field and
42 is referenced frequently to this day. (See Duda & Hart, for example.) The
43 data set contains 3 classes of 50 instances each, where each class refers to a\
,→ ntype of iris plant. One class is linearly separable from the other 2;
,→ the\nlatter are NOT linearly separable from each other.
44
45 .. topic:: References
46
47 - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
48 Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
49 Mathematical Statistics" (John Wiley, NY, 1950).
50 - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
51 (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
52 - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
53 Structure and Classification Rule for Recognition in Partially Exposed
54 Environments". IEEE Transactions on Pattern Analysis and Machine
55 Intelligence, Vol. PAMI-2, No. 1, 67-71.
56 - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
57 on Information Theory, May 1972, 431-433.
58 - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
59 conceptual clustering system finds 3 classes in the data.
60 - Many, many more ...’

Listing 4.5.: Description of Fisher’s Iris Data Set


70

1 y = pd.DataFrame(data=iris.target, columns = [’irisType’])


2 y.head()

Figure 4.5.: Output of the categories of Fisher’s Iris Data Set

By means of the command


1 y.irisType.value_counts()

can be used to determine how many classes are present. The output of the command
is shown in the figure 18.7; it results in 3 classes with the numbers 0, 1 and 2. There
are 50 records assigned to each of them.

Figure 4.6.: Names of the categories in Fisher’s Iris Data Set

4.1.4. Common Objects in Context - COCO


The dataset Common Objects in Context (COCO) is labelled and provides data for
training supervised computer vision models that are able to identify common objects
in the dataset. Of course, these models are still far from perfect. Therefore, the
dataset COCO provides a benchmark for evaluating the periodic improvement of these
models through computer vision research.[Lin+14; Lin+21]

• Object recognition
– The dataset COCO contains ≈ 330,000 images.
– The dataset COCO contains ≈ 1,500,000 annotations for objects.
– The data set COCO contains 80 categories.
– The pictures each have five headings.
– The images have a medium resolution of 640 × 480 pixels.

• Semantic segmentation
Chapter 4. Databases and Models for Machine Learning 71

{
" info " : {...} ,
" licenses " : {...} ,
" images " : { . . . } ,
" categories " : {...} ,
" annotations " : { . . . }
}

Listing 4.6.: Information of the data set COCO

{
" d e s c r i p t i o n " : "COCO␣ 2017 ␣ D a t a s e t " ,
" u r l " : " http :// cocodataset . org " ,
" version " : " 1.0 " ,
" year " : 2017 ,
" c o n t r i b u t o r " : "COCO␣ Consortium " ,
" d a t e _ c r e a t e d " : " 2017/09/01 "
}

Listing 4.7.: Metainformation of the dataset COCO

– Panoptic segmentation requires models for drawing boundaries between


objects in semantic segmentation.
• Key recognition
– The images contain 250,000 people labelled with the corresponding keys.

Due to the size and frequent use of the dataset, there are many tools, for example
COCO-annotator and COCOapi, to access the data.
Actually, the dataset COCO consists of several, each made for a specific machine
learning task. The first task is to determine surrounding rectangles for objects. That is,
objects are identified and the coordinates of the surrounding rectangle are determined.
The extended task is object segmentation. Here, objects are also identified, but in
addition, instead of the surrounding rectangles, polygons are drawn to delimit the
objects. The third classical task is cloth segmentation. The model should perform
object segmentation, but not on individual objects, but on continuous background
patterns such as grass or sky.

The annotations are stored in JSON format. The JSON format is a dictionary with
key-value pairs in curly brackets. It can also contain lists, ordered collections of
elements within curly braces, or dictionaries nested within them.

Section “Info”
The dictionary for the info section contains metadata about the record. For the
official record COCO it is the following information:
As can be seen, only basic information is included, with the url value pointing to
the official website of the dataset. This is common for machine learning datasets to
point to their websites for additional information. For example, there you can find
information on how and when the data was collected.

In the section “licenses” you will find links to licenses for images in the dataset with
the following structure:
72

[
{
" u r l " : " h t t p : / / creativecommons . o r g / l i c e n s e s /by−nc−s a / 2 . 0 / " ,
" id " : 1 ,
" name " : " A t t r i b u t i o n −NonCommercial−S h a r e A l i k e ␣ L i c e n s e "
},
{
" u r l " : " h t t p : / / creativecommons . o r g / l i c e n s e s /by−nc / 2 . 0 / " ,
" id " : 2 ,
" name " : " A t t r i b u t i o n −NonCommercial ␣ L i c e n s e "
},
...
]

Listing 4.8.: Licence information of the data set COCO

{
" license " : 3,
" file_name " : " 000000391895. jpg " ,
" c o c o _ u r l " : " h t t p : / / images . c o c o d a t a s e t . o r g / t r a i n 2 0 1 7 / 0 0 0 0 0 0 3 9 1 8 9 5 . j p g " ,
" height " : 360 ,
" width " : 6 4 0 ,
" date_captured " : " 2013−11−14␣ 1 1 : 1 8 : 4 5 " ,
" f l i c k r _ u r l " : " h t t p : / / farm9 . s t a t i c f l i c k r . com/8186/8119368305 _4e622c8349_z . j p g
" i d " : 391895
}

Listing 4.9.: Image information of the dataset COCO

This dictionary “images” is probably the second most important and contains metadata
about the images.
The images dictionary contains the field id. In this field the licence of the image is
given. The full text is given in the URL. When using the images, it must be ensured
that no licence infringement occurs. If in doubt, do not use them. This also means,
however, that when creating one’s own data set, one assigns an appropriate licence to
each image.
The most important field is the id field. This is the number used in the annotations
section to identify the image. So, for example, if you want to identify the annotations
for the given image file, you have to check the value of the id field for the corresponding
image document in the images section and then cross-reference it in the annotations
section.
In the official record COCO, the value of the id field is the same as the name file_-
name after removing the leading zeros. If one uses a custom record COCO, this may
not necessarily be the case.

Section “Categories”
The categories section is a little different from the other sections. It is designed
for the task of object recognition and segmentation and for the task of substance
segmentation.
For object recognition and object segmentation, the information is obtained according
to the listing 4.10
Chapter 4. Databases and Models for Machine Learning 73

[
{ " supercategory " : " p e r s o n " , " i d " : 1 , " name " : " p e r s o n " } ,
{ " supercategory " : " v e h i c l e " , " i d " : 2 , " name " : " b i c y c l e " } ,
{ " supercategory " : " v e h i c l e " , " i d " : 3 , " name " : " c a r " } ,
...
{ " supercategory " : " i n d o o r " , " i d " : 9 0 , " name " : " t o o t h b r u s h " }
]

Listing 4.10.: Class information of the dataset COCO

[
{ " s u p e r c a t e g o r y " : " t e x t i l e " , " i d " : 9 2 , " name " : " banner " } ,
{ " s u p e r c a t e g o r y " : " t e x t i l e " , " i d " : 9 3 , " name " : " b l a n k e t " } ,
...
{ " s u p e r c a t e g o r y " : " o t h e r " , " i d " : 1 8 3 , " name " : " o t h e r " }
]

Listing 4.11.: Substance information of the dataset COCO

In the section, the lists contain the categories of objects that can be recognised on
images. Each category has a unique number id and it should be in the range [1,
number of categories]. Categories are also grouped into supercategories that can be
used in programs to recognise, for example, vehicles in general, when you don’t care if
it is a bicycle, a car or a truck.

There are separate lists for substance segmentation, see Listing 4.11
The number of categories in this section start high to avoid conflicts with object
segmentation, as these tasks can be performed together in the panoptic segmentation
task. The values from 92 to 182 represent the well-defined background material, while
the value 183 represents all other background textures that do not have their own
classes.

The annotations section is the most important section of the dataset, containing
important information for the specific dataset for each task.
The fields according to the listing 4.12 haben folgende Bedeutung.

“segmentation”: This is a list of segmentation masks for pixels; this is a flattened list
of pairs, so you should take the first and second values (x and y in the image),
then the third and fourth, and so on, to get coordinates; note that these are
not image indices, as they are floating point numbers — they are created and
compressed from the pixel coordinates by tools like COCO-annotator.

“area”: This corresponds to the number of pixels within a segmentation mask.

´‘iscrowd‘”: This is a flag indicating whether the caption applies to a single object
(value 0) or to several objects close to each other (value 1); for fill segmentation,
this field is always 0 and is ignored.

“image_id”: The id field corresponds to the number of the id field from the images
dictionary; warning: this value should be used to cross-reference the image with
other dictionaries, i.e. not the field id.

“bbox”: The heading contains the surrounding rectangles or bounding box, i.e. the
coordinates in the form of the x and y coordinates of the upper left corner, as
74

{
" segmentation " :
[[
239.97 ,
260.24 ,
222.04 ,
...
]] ,
" area " : 2765.1486500000005 ,
" iscrowd " : 0 ,
" image_id " : 5 5 8 8 4 0 ,
" bbox " :
[
199.84 ,
200.46 ,
77.71 ,
70.88
],
" category_id " : 58 ,
" i d " : 156
}

Listing 4.12.: Annotations of the dataset COCO

well as the width and height of the rectangle around the object; it is very useful
for extracting individual objects from images, as in many languages such as
Python this can be done by accessing the image array as;
cropped_object = image[bbox[0]:bbox[0] + bbox[2], bbox[1]:bbox[1]
+ bbox[3]]

“category_id”: The field contains the class of the object, corresponding to the field
id from the section “categories”

“id”: The number is the unique identifier for the annotation; note that this is only
the ID of the annotation, so it does not refer to the respective image in other
dictionaries.

When working with crowd images (“iscrowd”: 1), the “segmentation” part may
be a little different:

" segmentation " :


{
" counts " : [ 1 7 9 , 2 7 , 3 9 2 , 4 1 , . . , 5 5 , 2 0 ] ,
" size " : [426 ,640]
}

This is because with a large number of pixels, explicitly listing all the pixels
that create a segmentation mask would take a lot of space. Instead, the dataset
COCO uses a custom compression Run-Length Encoding (RLE), which is very
efficient because segmentation masks are binary and RLE for only zeros and
ones can reduce the size many times.
Chapter 4. 75

4.1.5. ImageNet
ImageNet is an image database with more than 14 million images that is constantly
growing. Each image is assigned to a noun. For each category there are on average
more than 500 images. ImageNet contains more than 20,000 categories in English
with a typical category such as „balloon“ or „strawberry“. The database of third-party
image URL annotations is freely accessible directly through ImageNet, although the
actual images are not owned by ImageNet. [Den+09; Sha+20; KSH12b]

4.1.6. Visual Wake Words


The „Visual Wake Words“ dataset consists of 115,000 images, each with an indication
of whether or not it contains a person. [Cho+19] Images are extracted from the COCO
dataset. Since the dataset COCO provides information about the images, it can be
queried during extraction whether there is a person inside a surrounding rectangle or
not. The data is marked with the information.

4.2. FaceNet
[SKP15]
FaceNet lernt eine Abbildung von Gesichtsbildern auf einem kompakten euklidis-
chen Raum, in dem Abstände direkt einem Maß für die Ähnlichkeit von Gesichtern
entsprechen. Sobald dies geschehen ist, sind Aufgaben wie Gesichtserkennung, -veri-
fizierung und -clusterung mit Standardtechniken (unter Verwendung der FaceNet-Ein-
bettungen als Merkmale) einfach zu erledigen. Verwendet ein Deep CNN, das trainiert
wird, um die Einbettung selbst zu optimieren, anstatt die Ausgabe einer dazwischen-
liegenden Engpassschicht zu verwenden. Das Training erfolgt mit Triplets: ein Bild
eines Gesichts ("Anker"), ein weiteres Bild desselben Gesichts ("positives Exemplar")
und ein Bild eines anderen Gesichts ("negatives Exemplar"). Der Hauptvorteil liegt in
der Repräsentationseffizienz: Mit nur 128 Byte pro Gesicht kann eine Spitzenleistung
erzielt werden (Rekordgenauigkeit von 99,63 % bei LFW, 95,12 % bei Youtube Faces
DB).
siehe ../../MLbib/CNN/class10_FaceNet.pdf
FaceNet ist ein tiefes Faltungsneuronales Netzwerk, das von Google-Forschern entwick-
elt und um 2015 eingeführt wurde, um die Hürden bei der Gesichtserkennung und
-verifizierung effektiv zu lösen. Der FaceNet-Algorithmus transformiert das Gesichtsbild
in einen 128-dimensionalen euklidischen Raum, ähnlich wie beim Word Embedding[9].
Das so erstellte FaceNet-Modell wird auf Triplett-Verlust trainiert, um die Ähn-
lichkeiten und Unterschiede auf dem bereitgestellten Bilddatensatz zu erfassen. Die
vom Modell erzeugten Einbettungen mit 128 Dimensionen können verwendet werden,
um Gesichter auf sehr effektive und präzise Weise zu clustern. Durch die Verwendung
von FaceNet-Embeddings als Merkmalsvektoren könnten nach der Erstellung des
Vektorraums Funktionalitäten wie Gesichtserkennung und -verifikation implementiert
werden[10]. Kurz gesagt, die Abstände für die ähnlichen Bilder würden viel näher
sein als die zufälligen nicht ähnlichen Bilder. Die allgemeine Blockdarstellung des
FaceNet-Ansatzes der Gesichtserkennung ist in Abb.1 dargestellt.
siehe ../../MLbib/CNN/10.1109ICACCS.2019.8728466.pdf
Eingabeformat? Farben? Pixelauflösung?

4.3. Models
When installing NVidia’s jetson-inference project, the utility offers a large selection
of pre-trained deep learning models for download. Among them are well-known ones
like the AlexNetAlexNet from 2012 as well as various so-called ResNet. Also included
76
Network CLI argument NetworkType enum
AlexNet alexnet ALEXNET
GoogleNet googlenet GOOGLENET
GoogleNet-12 googlenet-12 GOOGLENET_12
ResNet-18 resnet-18 RESNET_18
ResNet-50 resnet-50 RESNET_50
ResNet-101 resnet-101 RESNET_101
ResNet-152 resnet-152 RESNET_152
VGG-16 vgg-16 VGG-16
VGG-19 vgg-19 VGG-19
Inception-v4 inception-v4 INCEPTION_V4

Table 4.1.: Pre-trained models of the project jetson-inference

are SSD-Mobilenet-V2, which recognises 90 objects from apples to toothbrushes, and


DeepScene from the University of Freiburg for recognising objects in road traffic.
To download more models, the Model Downloader can be called up:
$ cd jetson-inference/tools
$ ./download-models.sh

The models GoogleNet and ResNet-18, which are based on the ImageNet image
database, are automatically downloaded in the build step. The table ?? lists all models
that are available in the jetson-inference project. The first column of the table is
the name of the structure of the model. [Alo+18]. The second column contains the
passing parameter for the -network argument of the imagenet-camera.py program.
5. Frameworks and Libraries
As a programming language for data science, Python represents a compromise between
the R language, which focuses on data analysis and visualisation, and Java, which
forms the backbone of many large-scale applications. This flexibility means that
Python can act as a single tool that brings your entire workflow together.
Python is often the first choice for developers who need to apply statistical techniques
or data analysis to their work, or for data scientists whose tasks need to be integrated
into web applications or production environments. Python particularly shines in the
area of machine learning. The combination of machine learning libraries and flexibility
makes Python unique. Python is best suited for developing sophisticated models and
predictive modules that can be directly integrated into production systems.
One of Python’s greatest strengths is its extensive library. Libraries are sets of routines
and functions written in a specific language. A robust set of libraries can make the job
of developers immensely easier to perform complex tasks without having to rewrite
many lines of code.

5.1. Development Environment


5.1.1. Jupyter-Notebook
Mhe Jupyter Notebook application can be used to create a file in a web browser. The
advantage here is that programmes can be divided into smaller programme sections
and executed section by section. This makes working very interactive. The notebook
also offers the possibility of annotating Python programme codes with additional text.
[BS19]

5.2. Frameworks
5.2.1. TensorFlow
The TensorFlow framework, which is supported by Google, is the undisputed top dog.
It has the most GitHub activity, Google searches, Medium articles, books on Amazon
and ArXiv articles. It also has the most developers using it, and is listed in the most
online job descriptions. [Goo19b]
Variant also exist for TensorFlow that is specific to a hardware. For example, NVIDIA
graphics cards are addressed by a variant with Compute Unified Device Architecture
(CUDA) and Intel also has an optimised variant. However, you may have to do without
the latest version.
TensorFlow Lite, the small device version, brings model execution to a variety of
devices, including mobile devices and IoT, and provides more than a 3x increase in
inference speed compared to the original TensorFlow.
TensorFlow uses data flow graphs to represent computation, shared state, and the
operations that mutate that state. By fully representing the dependencies of each step
of the computation, parts of the computation can be reliably parallelised [Aba+16].
Originally, TensorFlow was to be used for Google’s internal use, but was released in
November 2015 under the Apache 2.0 open source licence. Since then, TensorFlow has
brought together a variety of tools, libraries and communities. TensorFlow provides
low-level interfaces for programming languages such as Python, Java, C++ and Go.

77
78

The mid-level and high-level APIs provide functions for creating, training, saving and
loading models, training, saving and loading models. [Cho18]

Figure 5.1.: Tensorflow Datenflussdiagramm [Aba+16]

5.2.2. TensorFlow Lite


Tensorflow Lite is an optimised environment for TensorFlow models for mobile devices
and the Internet of Things (IoT). Essentially, it consists of two components: the
converter, which converts pre-trained TensorFlow models into optimised TensorFlow
Lite models, and the interpreter, which enables optimised execution on the various
end devices. The goal here is to reduce the size of the existing models and to reduce
latency on less powerful devices [Goo20c].

5.2.3. TensorRT
TensorRT is a neural network runtime environment based on CUDA, which specialises
in optimising the performance of neural networks [NVI15]. This is achieved by reducing
the computational accuracy from Floating Point 32-bit (FP32) to Floating Point 16-bit
(FP16) resp. Integer 8-bit (INT8). Since the remaining interference accuracy is
sufficient for most use cases, the speed can be significantly increased with this method
[GMG16]. When optimising the network, the operators used are replaced by TensorRT
operators, which can only be executed within a TensorRT environment.

5.2.4. Keras
Keras is a high-level Application Programming Interface (API) application program-
ming interface (API) for neural networks built in Python and running on TensorFlow,
Theano or CNTK. It enables the definition and training of different deep learning mod-
els using optimised Tensor libraries, which serve as a backend engine. The TensorFlow
backend, Theano backend and CNTK backend implementations are currently sup-
ported. Any code written in Keras can be run on the backends without customisation.
Keras offers a choice of predefined layers, optimisation functions or other important
neural network components. The functions can also be extended with custom modules.
The two most important model types provided by Keras are sequential and functional
api. With sequential, straight-line models can be created whose layers are lined up
one after the other. For more complex network structures with feedback, there is the
functional api. [Cho18; SIG20b]

5.2.5. PyTorch
PyTorch, which is supported by Facebook, is the third most popular framework. It is
younger than TensorFlow and has rapidly gained popularity. It allows customisation
that TensorFlow does not. [Fac20]
Chapter 5. Frameworks and Libraries 79

5.2.6. Caffe
Caffe has been around for almost five years. It is in relatively high demand by
employers and frequently mentioned in academic articles, but has had little recent
coverage of its use. [JES20]

5.2.7. Caffe2
Caffe2 is another open source product from Facebook. It builds on Caffe and is now
in the PyTorch GitHub repository. [Sou20]

5.2.8. Theano
Theano was developed at the University of Montreal in 2007 and is the oldest major
Python framework. It has lost much of its popularity and its leader stated that
major versions are no longer on the roadmap. However, updates continue to be made.
[AR+16]
Theano uses a NumPy-like syntax to optimise and evaluate mathematical expressions.
What makes Theano different is that it uses the GPU of the computer’s graphics card.
The speed thus makes Theano interesting.

5.2.9. Apache MXNet


MXNET is supported by Apache and used by Amazon. [MXN20]

5.2.10. CNTK
CNTK is the Microsoft Cognitive Toolkit. It reminds me of many other Microsoft
products in the sense that it tries to compete with Google and Facebook offerings and
fails to gain significant acceptance. [Mic20]

5.2.11. Deeplearning4J
Deeplearning4J, also called DL4J, is used with the Java language. It is the only
semi-popular framework that is not available in Python. However, you can import
models written with Keras into DL4J. The framework has a connection to Apache
Spark and Hadoop. [Ecl20]

5.2.12. Chainer
Chainer is a framework developed by the Japanese company Preferred Networks. It
has a small following. [PN20]

5.2.13. FastAI
FastAI is built on PyTorch. Its API was inspired by Keras and requires even less
code for strong results. Jeremy Howard, the driving force behind Fast.AI, was a top
Kaggler and president of Kaggle. [fas20]
Fast.AI is not yet in demand for careers, nor is it widely used. However, it has a
large built-in pipeline of users through its popular free online courses. It is also both
powerful and easy to use. Its uptake could grow significantly.
80

5.3. General libraries


These are the basic libraries that transform Python from a general-purpose program-
ming language into a powerful and robust tool for data analysis and visualisation.
They are sometimes referred to as the SciPy stack and form the basis for the special
tools.

5.3.1. NumPy
NumPy is the fundamental library for scientific computing in Python, and many of
the libraries in this list use NumPy arrays as their basic inputs and outputs. In short,
NumPy introduces objects for multidimensional arrays and matrices, and routines
that allow developers to perform advanced mathematical and statistical functions on
these arrays with as little code as possible. [Fou20b]

5.3.2. SciPy
SciPy builds on NumPy by adding a collection of algorithms and high-level com-
mands for manipulating and visualising data. The package also includes functions
for numerically calculating integrals, solving differential equations, optimisation and
more.

5.3.3. pandas
Pandas adds data structures and tools designed for practical data analysis in finance,
statistics, social sciences and engineering. Pandas is well suited for incomplete, messy
and unlabelled data (i.e. For the kind of data you are likely to face in the real world)
and provides tools for shaping, merging, reshaping and splitting datasets.

5.3.4. IPython
IPython extends the functionality of Python’s interactive interpreter with a souped-up
interactive shell that adds introspection, rich media, shell syntax, tab completion and
command archive retrieval. It also acts as an integrated interpreter for your programs,
which can be particularly useful for debugging. If you have ever used Mathematica or
MATLAB, you will feel at home with IPython.

5.3.5. Matplotlib
Matplotlib is the standard Python library for creating 2D diagrams and plots. The
API is quite low-level, i.e. it requires several commands to produce good-looking
graphs and figures compared to some more advanced libraries. However, the advantage
is greater flexibility. With enough commands, you can create almost any graph with
matplotlib.

5.3.6. scikit-learn
scikit-learn builds on NumPy and SciPy by adding a set of algorithms for general ma-
chine learning and data mining tasks, including clustering, regression and classification.
As a library, scikit-learn has much to offer. Its tools are well documented and among
the contributing developers are many machine learning experts. Moreover, it is a very
helpful library for developers who do not want to choose between different versions of
the same algorithm. Its power and ease of use make the library very popular.
Chapter 5. 81

5.3.7. Scrapy
Scrapy is a library for creating spiderbots to systematically crawl the web and extract
structured data such as prices, contact information and URLs. Originally developed
for web scraping, Scrapy can also extract data from APIs.

5.3.8. NLTK
NLTK stands for Natural Language Toolkit and provides an effective introduction to
Natural Language Processing (NLP) or text mining with Python. The basic functions
of NLTK allow you to mark up text, identify named entities and display parse trees
that reveal parts of speech and dependencies like sentence diagrams. This gives you
the ability to do more complicated things like sentiment analysis or generate automatic
text summaries.

5.3.9. Pattern
Pattern combines the functionality of Scrapy and NLTK into a comprehensive library
that aims to serve as an out-of-the-box solution for web mining, NLP, machine learning
and network analysis. Its tools include a web crawler; APIs for Google, Twitter and
Wikipedia; and text analytics algorithms such as parse trees and sentiment analysis
that can be run with just a few lines of code.

5.3.10. Seaborn
Seaborn is a popular visualisation library built on top of matplotlib. With Seaborn,
graphically very high quality plots such as heat maps, time series and violin plots can
be generated.

5.3.11. Bokeh
Bokeh allows the creation of interactive, zoomable plots in modern web browsers using
JavaScript widgets. Another nice feature of Bokeh is that it comes with three levels of
user interface, from high-level abstractions that let you quickly create complex plots
to a low-level view that provides maximum flexibility for app developers.

5.3.12. basemap
Basemap supports adding simple maps to matplotlib by taking coordinates from
matplotlib and applying them to more than 25 different projections. The library
Folium builds on Basemap and allows the creation of interactive web maps, similar to
the JavaScript widgets of Bokeh.

5.3.13. NetworkX
This library allows you to create and analyse graphs and networks. It is designed
to work with both standard and non-standard data formats, making it particularly
efficient and scalable. With these features, NetworkX is particularly well suited for
analysing complex social networks.

5.3.14. LightGBM
Gradient Boosting is one of the best and most popular libraries for Machine Learning
and helps developers to create new algorithms by using newly defined elementary
models and especially decision trees.
82

Accordingly, there are special libraries designed for a fast and efficient implementation
of this method. These are LightGBM, XGBoost and CatBoost. All these libraries are
competitors but help solve a common problem and can be used in almost similar ways.

5.3.15. Eli5
Most of the time, model prediction results in machine learning are not really accurate.
Eli5, a library built into Python, helps overcome this very challenge. It is a combination
of visualising and debugging all machine learning models and tracking all steps of the
algorithm.

5.3.16. Mlpy - Machine Learning


As an alternative to scikit-learn, Mlpy also offers a powerful library of functions
for machine learning. Mlpy is also based on NumPy and SciPy, but extends the
functionality with supervised and unsupervised machine learning methods.

5.3.17. Statsmodels - Statistical Data Analysis


Statsmodels is a Python module that allows users to explore data, estimate statistical
models and run statistical tests. An extensive list of descriptive statistics, statistical
tests, plotting functions and result statistics is available for various data types and
each estimator. The module makes predictive analytics possible. Statsmodels is often
combined with NumPy, matplotlib and Pandas.

5.4. Specialised Libraries


5.4.1. OpenCV
OpenCV, derived from Open Computer Vision, is a free program library with algo-
rithms for image processing and computer vision. The development of the library
was initiated by Intel and was maintained by Willow Garage until 2013. After their
dissolution, it was continued by Itseez, which has since been acquired by Intel. [Tea20a]

5.4.2. OpenVINO
The OpenVINO toolkit helps accelerate the development of powerful computer vision
and deep learning in vision applications. It enables Deep Learning via hardware
accelerators as well as simple, heterogeneous execution on Intel platforms, including
accpu, GPU, Field-Programmable Gate Array (FPGA) and Video Processing Unit
(VPU). Key components include optimised features for OpenCV. [Int19b; Tea20b]

5.4.3. Compute Unified Device Architecture (CUDA)


The abbreviation CUDA stands for Compute Unified Device Architecture. It is an
interface technology and computing platform that allows graphics processors to be
addressed and used for non-graphics-specific computations. CUDA was developed by
NVIDIA and accelerates programmes by parallelising certain parts of the programme
with one or more GPUs in addition to the Central Processing Unit (CPU). [TS19;
Cor20; NVI20c]
Chapter 5. 83

5.4.4. OpenNN
OpenNN is a software library written in C++ for advanced analysis. It implements
neural networks. This library is characterised by its execution speed and memory
allocation. It is constantly optimised and parallelised to maximise its efficiency.
[AIT20]

5.5. Special Libraries


When programming the neural network, some predefined libraries are accessed. are
used. In the following, some interesting libraries are presented.

5.5.1. glob
The glob module finds all pathnames in a given directory which match a given pattern
and returns them in an arbitrary way. [Fou20a]

5.5.2. os
The os module provides general operating system functionalities. The functions
of this library allow programmes to be designed in a platform-independent way.
platform-independent. The abbreviation os stands for Operating System. It enables
access to and reference to certain paths and the files stored there. [Bal19]

5.5.3. PIL
The abbreviation PIL stands for Python Image Libary and provides many functions
for image processing. image processing. It provides fast data access to the basic image
formats. image formats is guaranteed. In addition to image archiving and batch
function applications file formats, create thumbnails, print images and perform many
other functions. many other functions can be performed.

5.5.4. LCC15
5.5.5. math
As can be easily inferred from the name, this library provides access to all mathematical
functions defined in the C standard. These include among others trigonometric
functions, power and logarithm functions and angle functions. tions. Complex
numbers are not included in this library. [Fou20b]

5.5.6. cv2
With the cv2 module, input images can be rewritten into three-dimensional arrays.
OpenCV contains algorithms for image processing and machine vision. The algorithms
are based on the latest research results and are continuously being further developed.
The modules from image processing include, for example, face recognition, as well as
many fast filters and functions for camera calibration. The machine vision modules
include boosting (automatic classification), decision tree learning and artificial neural
networks. [Wik20]

5.5.7. random
The Random module implements pseudo-random number generators for different
distributions. distributions. For example, random numbers can be generated or a
mixture of elements can be created. of elements can be generated. [Fou20c]
84

5.5.8. pickle
The pickle module implements binary protocols for serialising and deserialising a
Python object structure. of a Python object structure. Object hierarchies can be
converted in a binary stream (pickle) and then (pickle) and then be returned from
a binary architecture back to the object hierarchical structure (unpickle). structure
(unpickle). [Fou20d]

5.5.9. PyPI
The Python Software Foundation organisation organises and manages various packages
for Python programming. [Fou21a] Here you can find packages for various tasks. An
example is the official API for accessing the dataset COCO. [Fou21b]
6. Libraries, Modules, and Frameworks
Libraries, Module, and Framework have an inevatible role in Machine learning and
computer vision field. They are written on a specific topic, which have build-in functions
and classes who help the developer and data scientist to run usefull application. Some
usefull libraries, modules, and framework use in Machine learning (ML) and Computer
Vision(CS) are:

• Tensor Flow

• MediaPipe

• OpenCV

• Pyfirmata

6.1. TensorFlow Framework


TensorFlow is a machine learning system that operates at large scale and in het-
erogeneous environments. It uses dataflow graphs to represent computation, shared
state, and the operations that mutate that state. TensorFlow can train and run
deep neural networks for handwritten digit classification, image recognition, word
embeddings, recurrent neural networks, sequence-to-sequence models for machine
translation, natural language processing, and PDE (partial differential equation) based
simulations. It maps the nodes of a dataflow graph across many machines in a cluster,
and within a machine across multiple computational devices, including multicore CPUs,
general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units
(TPUs).TensorFlow enables developers to experiment with novel optimizations and
training algorithms. TensorFlow supports a variety of applications, with a focus on
training and inference on deep neural networks. Several Google services use TensorFlow
in production, we have released it as an open-source project, and it has become widely
used for machine learning research. In this paper, we describe the TensorFlow dataflow
model and demonstrate the compelling performance that Tensor- Flow achieves for
several real-world applications.TensorFlow

6.1.1. TensorFlow Use Case


Tensor flow is the framework made by google for making the Machine learning
application and training the model. Google search engine also based on Tensor
flow AI application, e.g; when a user type "a" in the google search bar, the search
engine predict the most suitable complete word to select, or it may be predicts depends
upon the previous search on the same system by the user too. It can be used across
a range of tasks but has a particular focus on training and inference of deep neural
networks. Tensor flow has a various collection of workflows to develop and train the
model using Python and Javascripts programming languages, after training the model
we can deploy the model on the Cloud or also on any device for making edge computing
application no matter which language use in the device. It is obvious that, the most
important part in every Machine learning application is to train the ML model, so
the model will apply in real time condition and take the decision without any human
intervention as the human normally make.

85
86

6.2. OpenCV
OpenCV is the huge open-source library for the computer vision, machine learning, and
image processing. It plays a major role in real-time operation which is very important
in today’s systems. Computer Vision is the science of programming a computer to
process and ultimately understand images and video, or simply saying making a
computer see. Solving even small parts of certain Computer Vision challenges, creates
exciting new possibilities in technology, engineering and even entertainment. By using
it, one can process images and videos to identify objects, faces, or even handwriting
of a human. When we create applications for computer vision that we don’t want to
build from scratch we can use this library to start focusing on real world problems.
There are many companies using this library today such as Google, Amazon, Microsoft
and Toyota. OpenCV.

6.2.1. Application of OpenCV


There are lots of applications which are solved using OpenCV, some of them are listed
below.

• Face recognition

• Automated inspection and surveillance

• Number of people – count (foot traffic in a mall, etc)

• Vehicle counting on highways along with their speeds

• Interactive art installations

• Anamoly (defect) detection in the manufacturing process (the odd defective


products)

• Street view image stitching

• Video/image search and retrieval

• Robot and driver-less car navigation and control

• Object recognition

• Medical image analysis

• Movies – 3D structure from motion

• TV Channels advertisement recognition

6.2.2. Image Processing


Image Processing is one of the basic functionality of OpenCV, it is a method to
perform some operations on an image, in order to get an enhanced image and or to
extract some useful information from it.

6.2.3. How does a computer read an image


We are humans we can easily make it out that is the image of a person who is me.
But if we ask computer “is it my photo?”. The computer can’t say anything because
the computer is not figuring out it all on its own. [Image Processing] The computer
reads any image as a range of values between 0 and 255. For any color image, there
are 3 primary channels -red, green and blue.
Chapter 6. 87

6.3. MediaPipe
MediaPipe is one of the most widely shared and re-usable libraries for media processing
within Google. Google open-source MediaPipe was first introduced in June, 2019. It
aims to to provide some integrated computer vision and machine learning features.
It has some build in module who can detect the Human motion by tracking the
different part of the body. This Github link [MediaPipe Github] shows all the available
application the MediaPipe Module able to perform. It is best suited for gesture
detection for edge computing application.

6.4. MediaPipe Applications


MediaPipe has numerous application in Machine learning and computer vision. It is a
python supportive library and perform well for edge devices too. The module is train
on 30,000 images, so its accuracy and precision is very high. The reason for using
MediaPipe is that, it work well for independent of background, because it detects just
the Landmarks of different part of body. The remaining things in the frame is invisible
for the MediaPipe Module. The Following most important use cases of MediaPipe are:

• Hand Landmarks

• Face Detection

• Object Detection

• Holistic

6.4.1. Hand Landmarks Detection


The ability to perceive the shape and motion of hands can be a vital component in
improving the user experience across a variety of technological domains and platforms.
Every hand there are 21 landmarks, each finger has 4 and there is landmark in
the middle of hand. For gesture detection we can use this technique to make the
different types of gesture. We are able to make different types of gestures by changing
the position of landmarks. MediaPipe Hands is a high-fidelity hand and finger
tracking solution. It employs machine learning (ML) to infer 21 3D landmarks
of a hand from just a single frame. Whereas current state-of-the-art approaches
rely primarily on powerful desktop environments for inference, our method achieves
real-time performance on a mobile phone, and even scales to multiple hands. Fig6.1
shows all the Hand Landmarks of a hand, which can help us to make different gestures.
[Hand Landmarks]

Figure 6.1.: Hand Landmarks


88

6.4.2. Face Detection


MediaPipe Face Detection is an ultrafast face detection solution that can detect 6
landmarks on the face and also support multiple faces. Fig6.2 shows the detection of
multiple faces using landmarks. The detector’s super-realtime performance enables
it to be applied to any live viewfinder experience that requires an accurate facial
region of interest as an input for other task-specific models, such as 3D facial keypoint
or geometry estimation (e.g., MediaPipe Face Mesh), facial features or expression
classification, and face region segmentation. Face Detection

Figure 6.2.: Face Detection Using Landmarks


Part II.

Jetson Nano

89
7. Jetson Nano
7.1. The Jetson Nano and the Competition
System-on-a-chip (SOC) systems offer a low-cost entry point for developing autonomous,
low-power deep-learning applications. They are best suited for inference, i.e. for
applying pre-trained machine learning models to real data. The Jetson Nano from
the manufacturer NVIDIA is such an SOC system. It is the smallest AI module in
NVIDIA’s Jetson family, promises high processing speeds high processing speed with
moderate power consumption at the same time. [NVI19]

At the lower end of the price spectrum, several low-cost systems have come to market
in recent years that are suitable for developing machine learning applications for
embedded systems. Chips and systems have emerged that have special computational
kernels for such tasks, with low power consumption at the same time.
For developers, Intel has created an inexpensive solution with the Neural Compute
Stick, for example, which can be connected to existing hardware such as a Raspberry
Pi via USB. For around 70 euros, you get a system with 16 cores that is based on
Intel’s own low-power processor chip for image processing, Movidius. The product
from Intel uses Movidius Myriad X Vision Processing Unit and can be used with the
operating systems Windows® 10, Ubuntu or MacOS. [Int19a]

Google is trying to address the same target group with the Google Edge TPU Boards.
For 80 euros, you get a Tensor Processing Unit (TPU) that uses similar hardware to
Google Cloud or Google’s development environment for Jupyter notebooks Colab and
can also be used as a USB 3 dongle. [Vin18; Goo19a; Goo11b]

Comparable technology can also be found in the latest generation of Apple’s iPhone
with the A12 Bionic chip with 8 cores and 5 trillion operations per second, in Samsung’s
Exynos 980 with integrated Neural Processing Unit (NPU) or in Huawei’s Kirin 990
with two NPUs at once. [Sam20; HDC20]

7.2. Applications of the Jetson Nano


There are several use cases for the Jetson Nano. It can be used for very basic tasks
that can also be done with devices like the Raspberry Pi. It can be used as a small,
powerful computer in conventional use cases such as running a web server or using it
to surf the web and create documents, or as a low-end gaming computer. [Kur21]

But the real added value of the Jetson Nano is in its design for easy implementation
of artificial intelligence. What makes this so easy is the NVIDIA TensorRT software
development kit, which enables high-performance deep learning interference. An
already trained network can be implemented and evolve as it is confronted with the
new data available in the specific application.

NVidia’s [NVI20a] website features a variety of projects from the Jetson community.
Besides simple teleoperation, the Jetson Nano also allows the realisation of collision
avoidance and object tracking up to the further construction of a small autonomous
vehicle. In addition to the non-commercial projects, industrial projects are also

91
92

presented. One example is the Adaptive Traffic Controlling System, which uses a
visual traffic congestion measurement method to adjust traffic light timing to reduce
vehicle congestion. Another project enables the visually or reading impaired to hear
both printed and handwritten text by converting recognised sentences into synthesised
speech. This project also basically uses only the Jetson Nano and the Pi camera.
Tensorflow 2.0 and Google Colab are used to train the model. The recognition of sign
language, faces and objects can also be realised with Jetson Nano and pre-trained
models.
8. Application “Image Recognition
with the Jetson”
8.1. Application Example “Bottle Detection”
Due to growing data analytics capabilities, developments such as predictive mainte-
nance and intelligent fault detection are being driven in industry. By using artificial
intelligence, fewer or no human decisions are required and more complex relationships
can be identified. The goal here is often quality control or early detection of error
sources or expected errors.

A simpler application is the rapid detection of faults that have already occurred.
A defect can be a foreign object or unwanted object that can enter the system in
any phase such as production, manufacturing, assembly or logistics of a product
development. To avoid this or to detect it at an early stage, an artificial intelligence
system can be used to continuously monitor the system in the required phases.

In this example, bottles are to be detected in any environment. The object detection
is to be continuous. A camera is used to capture the environment. The collected data
is verified with the NVIDIA Jetson Nano. If a bottle is detected, the Jetson Nano
shall provide a binary signal to a connected system. In industry, the system could be
a Speicherprogrammierbare Steuerung (SPS) that controls a machine.

After starting the programme, the live image of the camera is permanently displayed
on the screen and the result of the image recognition is shown, as can be seen in
Figure 8.1. This makes it easy to check the results of the Jetson Nano during the
development and testing phase.

Figure 8.1.: Application example: Bottle detection


WS:Quelle

8.2. Necessary Hardware


A Jetson Nano with camera system is required for the optical bottle detection applica-
tion. Since the application is for demonstrating the possibilities, this project does not
use a high quality camera, but a Raspberry Pi CAM [Mon17]. The choice of camera

93
94

and lens will of course influence successful detection. In an industrial application, a


precise research has to be done here; in any case, the environment has to be taken into
account. Resolution of the camera, lighting conditions, humidity, splash water and
vibrations are typical issues here. For a demonstrator, the Raspberry Pi camera is
perfectly adequate. A complete material list is given in the appendix ??. The Jetson
Nano Developer Kit is presented in detail in chapter 9. The camera and its features
are highlighted in chapter ??. In addition, a computer with an internet connection
and the ability to write to a microSD card is also required; however, this is not listed
in the materials list.
9. NVIDIA Jetson Nano Developer Kit
Starting with the NVIDIA Jetson Nano Developer Kit, we will now describe the
construction and commissioning of the Jetson Nano. The description of the company
NVIDIA [NVI19] serves as a basis for this. First, you must familiarise yourself with
the hardware. Then the operating system is installed and started for the first time.
This is a standard procedure, which is independent of the application. After that, the
software of the application can be installed and tested.

9.1. NVIDIA Jetson Nano Developer Kit


The NVIDIA Jetson Nano is a Single-Board-Computer (SBC) designed specifically for
machine learning algorithm application. The NVIDIA Jetson Nano is built in such a
way that you have direct access to the components, inputs and outputs. However, to
protect the computer, a suitable enclosure should be used.
Similar to the Rapsberry Pi, the Jetson Nano shows all its components and outputs.
The Jetson Nano kit packaging can be used to build the Jetson Nano. However, for
better protection, it is recommended to pack it in an additional case. The illustration ??
shows the Jetson Nano both from above. When the Jetson Nano is used in this book,
it is always described in this orientation; this should be noted as confusion can occur,
especially with the digital inputs and outputs.

9.2. Construction of the Jetson Nano


The NVIDIA Jetson Nano has a dimension of about 70 × 45mm without the case. The
large heat sink dominates the board. Hidden under the heat sink is a very powerful
microchip, the NVIDIA Tegra X1, which contains four ARM application processor
cores and 128 graphics processing units (GPU). The figure ?? shows the top view of
the NVIDIA Jetson Nano with its interfaces and the figure ?? shows it from the side
with the sockets.

External power supply


The Jetson Nano requires an external power supply. A DC power supply (5V 2A)
must be available for this purpose. The power supply can be connected to either a 5V
power supply via the 5V DC interface or Micro USB interface. If the 5V DC interface
will be used, a jumper is required to bridge the two pins of J48 that are located
between the camera and power supply connector. The diameter of the connector is
2.1mm. The centre is the positive pole. If the power is supplied via the Micro USB
interface, the jumper must be removed.

HDMI and display port


A monitor is not necessary for later operation, but it is very useful for commissioning
and some applications. An external monitor can be connected either to the HDMI
interface or to the display port. Various resolutions with up to 4K are available.
It should be noted that HDMI-to-DVI adapters are not supported. Therefore, a
monitor that directly supports HDMI or display port input must be used. In the
table ?? lists the possible image resolutions that are supported.

95
96

Figure 9.1.: Jetson Nano: Oversight

J44
Serial
J38
PoE
J40
Buttons

J13 J41
Kamera Expansion

J48
Jumper J15
5V Fan

PWR
LED

Figure 9.2.: Jetson Nano: Oversight with labels


Chapter 9. NVIDIA Jetson Nano Developer Kit 97

5V DC Display Port USB 3.0 USB 3.0 GigEth Micro USB


HDMI USB 3.0 USB 3.0 PWR oder
USB Device
PWR LED

Figure 9.3.: Side view of the Jetson Nano with labels

Video Encode 250MP/sec


H.264/H.265 1 × 4K mit 30 f/sec
H.264/H.265 2 × 1080p mit 60 f/sec
H.264/H.265 4 × 1080p mit 30 f/sec
H.264/H.265 4 × 720p mit 60 f/sec
H.264/H.265 9 × 720p mit 30 f/sec

Table 9.1.: Supported video encoding formats

If no monitor is to be used in the application, this must be taken into account when
installing the operating system.

µUSB interface

A power supply (5V 2A) with µUSB output can also be used at the µUSB interface.
µUSB was chosen because inexpensive power supplies with this interface are easy to
find. The maximum power consumption is 5W to 10W. A high quality power supply
unit should be chosen. If an unsuitable power supply is used, the NVIDIA Jetson
Nano will not start. A simple measure is to use a power supply with a shorter cable,
as this leads to a lower voltage drop.

USB 3.0 interfaces

The Jetson Nano has 4 USB 3.0 interfaces. The current consumption on a USB
interface is limited to 100mA. If the connected device has a higher consumption, it
must be equipped with an additional power supply. As a rule, the USB keyboard and
mouse can be connected.

Ethernet interface

With the help of the Ethernet interface with RJ45 connection, the computer can either
be connected directly to the Internet or communicate with another computer. Gigabit
Ethernet can be used with a patch cable plugged into the router. Ethernet connections
based on 10/100/1000BASE-T are supported. However, a WiFi connection can also
be achieved with an optional USB dongle for WLAN.

Status LED

The status LED indicates whether the Jetson Nano is receiving power. The status
LED will flash if there is insufficient or too much power from the power supply.
98

GPIOs
The Jetson Nano has an interface through which extensions can be connected. There
are 40 connections to the outside at the J41 interface. They are called General Purpose
Input Output (GPIO). With this, own circuits can be connected and the board can
be provided with new functions. LEDs or motors can be connected here. But there
are other interesting interfaces of the type SPI, I2 C and I2 S available.

CSI connection for a camera


The Jetson Nano is additionally equipped with a camera CSI connector. The exact
specification is 12 lanes (3x4 or 4x2) MIPI CSI-2 D-PHY 1.1 (1.5 Gb/s per
pair). If suitable software is used, the Jetson Nano can be used for image recognition.

microSD card
The Jetson Nano does not have a hard disk with an operating system on it. Therefore,
there must be another way to store it. For this purpose, the Jetson Nano uses a
microSD card. The SD card contains the operating system that is to be started and
the application programmes. Accordingly, the SD card must be chosen. It should
therefore be fast and large enough; the recommended minimum is a 16GB UHS-1 card.
The slot is on the bottom of the Jetson Nano.

9.2.1. Technical data


WS:Ausführen GPU
NVIDIA Maxwell architecture with 128 NVIDIA CUDA® cores
Prozessor
CPU
Quad-core ARM Cortex-A57 MPCore processor
Memory
4 GB 64-bit LPDDR4, 1600MHz 25.6 GB/s
Storage
16 GB eMMC 5.1
Every notebook has an embedded multi media card (eMMC), a hard disk or an SSD
- or a combination of these. An embedded multi media card is technically similar
to an SD card and is a cheap option for manufacturers to offer 16 to 128 GBytes of
permanently soldered storage space. Like SSDs, EMMCs offer low access times but
slow sequential data transfer rates.

9.3. ToDo
Werbung, Quellen fehlen, Siehe pdf-Dokumente bzgl. Raspberry Pi als Vorlage

9.4. Interfaces of the Jetson Nano


9.4.1. J41 Expansion Header Pinout - GPIO Pins
The Jetson Nano can also communicate with other systems. For this communication
or to control external devices, it can be helpful to be able to control the pins of the
J41 header. The pins allow different types of communication:

• Serial communication Universal Asynchronous Receiver Transmitter (UART)


Chapter 9. 99

CTS
TXD
TXD
N/A
RTS
GND

Figure 9.4.: J44 Serial interface - 3.3V 115200 Baud, 8N1

DIS (disable auto power-on)


RST (system reset)
FRC (board recovery mode))
ON (power on) )

Figure 9.5.: J40 Buttons

• Communication via Inter-Integrated Circuit (I2 C)

• Serial Peripheral Interface (SPI)

• Inter-IC Sound (I2 S)

• Direct communication via GPIO

Binary signals are exchanged between two systems via communication via GPIO. Thus,
they are simply digital inputs and outputs that can be defined as inputs or inputs as
required.
By default, all interface signal pins of the Jetson Nano are configured as GPIO. Pins
3, 5, 8, 9, 27 and 28 are exceptions to this. Pins 3 and 5 and pins 27 and 28 are for
the two I2 C interfaces with the data lines Serial Data (SDA) and the line with the
clock signal Serial Clock (SCL). Pins 8 and 10 are reserved for the receive line (Rx) of
the one and the transmit line (Tx) of the UART. [NVI21a]
To use the GPIO, the GPIO library must be installed.[NVI20b].
The so-called pinout of the Jetson Nano can be seen in this figure 9.2.

9.4.2. Serial interface J44 UART


The Jetson Nano has two UART type interfaces. The serial UART can be used for
applications that require bidirectional communication.

9.4.3. Interface J40 Buttons


9.4.4. I2 C-Interface
The serial data bus I2 C allows the master unit to send data to a slave unit or request
data from a slave unit.

9.4.5. I2 S-Interface
9.4.6. Serial Peripheral Interface (SPI)
In the default configuration, the Jetson Nano does not have SPI port access. However,
this can be configured on the SChnittstelle J41 Expansion Header.
100

I I
Sysfs GPIO Name Pin Pin Name Sysfs GPIO
I I
I2S_4_SDOUT
gpio78 40 39 GND
D21
I2S_4_SDIN SPI_2_MOSI
gpio77 38 37 gpio12
D20 D26
UART_2_CTS I2S_4_LRCK
gpio51 36 35 gpio76
D16 D19
GPIO_PE6
GND 34 33 gpio38
D13
LCD_BL_PWM GPIO_PZ0
gpio168 32 31 gpio200
D12 D6
CAM_AF_EN
GND 30 29 gpio149
D5
I2C_1_SCL I2C_1_SDA
28 27
D1/I2C Bus 0 D0/I2C Bus 0
SPI_1_CS1
gpio20 26 25 GND
D7
SPI_1_CS0 SPI_1_SCK
gpio19 24 23 gpio18
D8 D11
SPI_2_MISO SPI_1_MISO
gpio13 22 21 gpio17
D25 D9
SPI_1_MOSI
GND 20 19 gpio16
D10
SPI_2_CS0
gpio15 18 17 3.3V
D24
SPI_2_CS1 LCD_TE
gpio232 16 15 gpio194
D23 D22
SPI_2_SCK
GND 14 13 gpio14
D27
I2S_4_SCLK UART_2_RTS
gpio79 12 11 gpio50
D18 D17
UART_2_RX
10 9 GND
/dev/ttyTHS1 - D15
UART_2_TX AUDIO_MCLK
8 7 gpio216
/dev/ttyTHS1 - D14 D4
I2C_2_SCL
GND 6 5
I2C Bus 1 - D3
I2C_2_SDA
5V 4 3
I2C Bus 1 - D2
I
5V 2 1 3.3V
I

Table 9.2.: Pinout of the GPIOs of the Jetson Nano


Chapter 9. 101

PWM
Tach
5V
GND

Figure 9.6.: J15 5V fan

9.4.7. Power over Ethernet


9.4.8. Interface J15 to the fan

9.5. Structure of the Jetson Nano


• housing
• WLAN
• monitor
• keyboard

9.6. JetPack operating system for the Jetson Nano


9.6.1. Bescheibung des Betriebssystems
NVIDIA provides its own operating system „JetPack“ for the Jetson Nano. This
operating system is not only suitable for the Jetson Nano, but for the Jetson family,
consisting of Jetson Nano, Jetson AGX Xavier and Jetson TX2. The operating system
has a Linux kernel and features the Debian Management Tool. Therefore, one can
work with the system on the basis of common Linux knowledge. In particular, it
supports various machine learning modules:

• CUDA,
• TensorRT,
• Computer Vision,
• Deep Stream SDK.

This enables development for different hardware systems. Thus, the performance can
be scaled depending on the application. [AY18; NVI21b] The figure 9.7 shows the
structure of the development environment with the modules mentioned and others.
The possible uses of the modules are also mentioned.

9.6.2. Installing the operating system


The operating system must be provided on a microSD card. An image file can be
found on the NVIDIA website. The image file is loaded from there onto a PC. It is
written to the microSD card using a special programme that takes into account the
file system of JetPack, which is the same as the file system of Linux. The microSD
card is then made available to the Jetson Nano. The first time the system is started, it
can be configured. This includes, among other things, the installation with or without
a graphical user interface. The individual steps are now explained.
102

Figure 9.7.: Jetson Nano SDK , source: Heise, 29.11.2019

Writing the operating system to the microSD card


Using a computer that has an internet connection, the operating system is loaded
from the NVIDIA website [NVI21a; NVI20b]. The operating system can be copied
to a microSD card using the usual tools, e.g. Etcher, available from the website
https://www.balena.io/etcher/. [NVI19]
First, the microSD card must be formatted:

1. Windows:
• Format your microSD card with the SD memory card formatter from the
company SD Association [Link]. The SD card formatter must be installed
and started. After starting the SD Memory Card Formatter programme, a
dialogue opens, see figure 9.8.
Then the following steps have to be gone through:
a) First, the microSD card must be inserted into an SD card drive.
b) Then the correct card drive is selected.
c) The type „Quick format“ is then selected.
d) The field „Volume label“ must not be changed.
e) After that, formatting can be started by activating the button „Format“.
However, a warning dialogue opens first, which must be confirmed with
the button „Ja“.
After formatting the microSD card, you can write the image file to the mi-
croSD card using the Etcher programme. The following steps are necessary
for this:
a) The programme must be installed and started. After starting the
Etcher programme, a dialogue opens, see Figure 9.8.
Chapter 9. 103

Figure 9.8.: Dialogue for formatting a SD card

Figure 9.9.: Etcher start dialogue


104

Figure 9.10.: Unformatted file

Figure 9.11.: Windows message for an unformatted microSD card

b) In this start dialogue the button „Select image“ is first. Then the
downloaded image file can be selected from.
If the microSD card is not formatted, the operating system reports
this, see figure ??. Then the process must be aborted and the card
must first be formatted.
c) Pressing the button „Flash!“ starts the copying process. The process
takes about 10 minutes if a USB 3 interface is used. During this time
Etcher writes the image file and then validates it.
d) After the process is complete, Windows may report that the microSD
card cannot be read, see Figure ??. This message is to be ignored. The
microSD card can simply be removed from the drive.

Removing the microSD card from the drive completes its preparation.

9.6.3. Setup with graphical user interface and first start


A microSD card containing the operating system is now available. Now the Jetson
Nano can be prepared to configure and use the operating system. There are two basic
options:

• use with graphical interface


• use without graphical user interface

This decision must be made before the first start. A change cannot be made later.
Then there is only the possibility to change by reinstalling the operating system. This
means that all settings must also be repeated and the software that the operating
system does not provide must also be added again.

This section describes how to configure a graphical user interface.

Before switching on the Jetson Nano, some preparations must be made. First, a
monitor must be connected via an HDMI cable, a keyboard and a mouse. The Jetson
Nano is ready and also a suitable power supply unit. However, the power supply unit
Chapter 9. 105

Figure 9.12.: Welcome message after the first start

must not yet be plugged in, as the Jetson Nano will then start automatically. Under
these conditions, the prepared micorSD card is inserted into the Jetson Nano. Now
the system can be switched on. First the monitor is switched on, then the power
supply unit is connected to the Jetson Nano. The green PWR LED lights up and the
system starts automatically. The boot process takes a few minutes. The system must
then be configured during the initial start-up.

First start
During the first boot process, some settings have to be made:

First, the End User License Agreement (EULA) from NVIDIA for the Jetson Nano
software is displayed. This agreement must be accepted. Then the language must
be set. The keyboard layout can also be selected. This should match the connected
keyboard. A suitable time zone must also be selected. Then the settings must be made,
which will be requested at the next login. The following labels are recommended:

• setting of the name: MSR-Lab

• Designation of the computer name:JetsonNanoMSRLab

• User name creation: msrjetson

• Setting the password: MSR

• Login type: automatic login

Finally, the size of the partition must be specified. For the standard application, the
maximum size must be specified here.

After logging in
After successful configuration, a welcome message, see ??. The system can now be
used.
106

9.6.4. Setup without graphic interface


If the Jetson Nano is used as an edge computer, there is usually not an option to
connect a monitor or keyboard. In this case, the system must be configured so that
no interface is installed. This is described in this section.
Preparations must again be made for this. The following parts must be provided:

• A microSD card with the operating system,

• Jetson Nano,

• 5V power supply with jack plug,

• Ethernet cable.

• USB cable with microUSB plug

• PC

• Ethernet connector

Since no interface is installed on the Jetson Nano, it must be connected to a computer.


Since Jetson Nano must also be connected to the Internet, only the micro USB port
is available for this purpose. Therefore, a power supply unit with a jack plug must
be used in this case. Before connecting the power supply, the Jetson Nano must be
connected to the computer via a USB interface and to the Internet via an Ethernet
cable. A monitor must not be connected. During the first start-up of the Jetson Nano,
this is checked; if a monitor is connected, the graphical user interface is installed. Now
the microSD card can be inserted into the Jetson Nano. The Jetson Nano can now be
switched on.

9.7. Remote access to the Jetson Nano


Access via Linux or MacOS
After switching on the Jetson Nano, search for the USB device in /dev on the Linux or
MacOS host. For example, this is tty.usbmodemFD133. This device is then connected
to using the terminal emulator „screen“:
screen /dev/tty.usbmodemFD133 115200

Access via Windows


Under Windows, the interface of the Jetson Nano is found as a COM port. With the
programme „putty“ the connection via the serial interface can be started.

Configuration of the Jetson Nano


If the connection is successful, the installation wizard oem-config is used to set the
required data. As with the installation with a graphical interface, the End User License
Agreement (EULA) from NVIDIA for the Jetson Nano software is displayed first. This
agreement must be accepted. Then the language must be set. The keyboard layout
can also be selected. This should match the connected keyboard. A suitable time zone
must also be selected. Then the settings must be made, which will be requested at
the next login. The following labels are recommended:

• setting of the name: MSR-Lab

• Designation of the computer name:JetsonNanoMSRLab


Chapter 9. 107

• User name creation: msrjetson


• Setting the password: MSR
• Login type: automatic login

After that, the Jetson Nano is available. You can then connect to the nano via ssh or
putty:
ssh elmar@JetsonNanoMSRLab

9.7.1. Switching on and activating Secure Shell


Secure Shell (SSH) enables access to the Jetson Nano even if it is not equipped with
a keyboard, mouse and monitor. The basic requirement is that the Jetson Nano is
connected to the network via Ethernet or WLAN.

Installation of SSH
An installation of SSH is not necessary, as it is included in the distribution of the
Jetson Nano’s operating system. This only needs to be activated.

Activation of SSH
To activate SSH, the SSH server is started. This is done by the following command:
sudo /etc/init.d/ssh start
Since this is necessary with every use of SSH, this process is automated:
sudo update-rc.d ssh defaults
After this entry, SSH is permanently available, even after a reboot. Access via SSH to
the Jetson Nano is then always possible.

Another possibility is to place an empty file named „ssh“ in the root directory. If the
file exists, SSH is automatically activated at the next boot.

You can activate SSH access via the system settings. To do this, open the configuration
dialogue in the start menu Settings Configuration , switch to the tab „Interfaces“ and set
the item „SSH“ to „Enabled“. After confirmation, SSH is also permanently available.
The following command is used to open the dialogue:
sudo raspi-config
The Jetson Nano must then be restarted:
sudo reboot

Access via user name and IP address


For SSH access, the user name and password of the Jetson Nanoss must be known.
The user name and associated password are specified during configuration during the
first start-up.

In order to use SSH, the IP address of the Jetson Nano must be known. With the
terminal command
ifconfig
the current network settings including the IP address are output.
108

SSH-Client under Windows

Since 2017, Windows has had an SSH implementation based on OpenSSH, which is
integrated into the PowerShell command line. To use it, the PowerShell must be called
up via the Start menu and the following command entered:

ssh benutzername@<IPAdresseDesJetsonNanos>

When connecting for the first time, the SSH keys of the Jetson Nano must be confirmed
with „yes“. After entering the user password, remote access can be performed.

9.7.2. Remote Desktop


If the Jetson Nano is to be operated without a monitor, keyboard and mouse, then a
remote desktop access server called xrdp should be used. The installation is simple:

$ sudo apt-get install xrdp

After the installation is complete, the Jetson Nano must be restarted. Once the
reboot is complete, the nmap command on the PC can be used to check whether the
installation of xrdp was successful:

$ nmap jetson
Starting Nmap 7.70 ( https://nmap.org ) at 2019-04-13 01:39 BST
Nmap scan report for jetson (192.168.1.118)
Host is up (0.0025s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3389/tcp open ms-wbt-server
Nmap done: 1 IP address (1 host up) scanned in 1.25 seconds
$

Although Remote Desktop Protocol (RDP) is a proprietary protocol, Microsoft Viewer


provides it free of charge for most platforms. This must be installed. For the Windows
system, the desktop must be added in the Microsoft Remote Desktop programme:
„Add Desktop“

Once the settings have been configured, the session can be set in full screen mode by
setting a reasonable resolution for the resulting windowed desktop.
If the PC is connected to the board via RDP, the desktop will look slightly different.
This is because a standard Ubuntu desktop is displayed, running Gnome, rather than
the older Unity-style desktop that is the default for L4T. This means that a separate
desktop is created for the RDP. This desktop remains until the user logs off, closing
the RDP window does not close the desktop. This means that the next time the RDP
window is opened, the desktop will still be available.
However, it should be noted that one cannot be logged on to the physical desktop and
open an RDP desktop at the same time.
If the Jetson Nano is accessed via a wireless network, in the Gnome application, under
„Settings“, select „Power“ from the menu on the left and ensure that „Turn off Wi-Fi
to save power“ is turned off.
Now the monitor, keyboard and mouse can be unplugged, s are no longer needed.
Chapter 9. 109

9.8. Software installation


9.8.1. Installing pip
Python’s default package installer pip is not pre-installed on the Jetson Nano. The
installation is simple:

sudo apt install python3-pip

9.8.2. Installing cmake and git


However, there is an extensive initial demonstration project entitled „Hello AI
World“that can be downloaded from the project’s GitHub. For this, the programmes
cmake and git must first be installed:

$ sudo apt-get install cmake


$ sudo apt-get install git

9.9. Installing the GPIO-Library


The library for managing and using the GPIO interface is provided by NVIDIA in a
GitHub project. With the help of the command

$ git clone https://github.com/NVIDIA/jetson-gpio

the project can be copied to the Jetson Nano. The library can then be installed. In
the directory where the setup.py file of the project is located, now the command

$ sudo python3 setup.py install

can be executed. To be able to work with the library, one must have the user rights.
To do this, first create a GPIO user group using

$ sudo groupadd -f -r gpio

is created. Then one can set the corresponding user rights by adding a user:

$ sudo usermod -a -G gpio msrjetson

The programme udev manages the devices. The user rules must now be communicated
to this programme. To do this, copy the file 99-gpio.rules into the rules.d directory.

$ sudo cp lib/python/Jetson/GPIO/99-gpio.rules /etc/udev/rules.d/

These rules are only used after a restart of the system. Alternatively, the rules can
also be loaded actively:

$ sudo udevadm control --reload-rules && sudo udevadm trigger

The GitHub project also contains sample programmes located in the samples folder.
These can be used for testing. One example is the program simple_out.py. It uses
the BCM pin numbering mode of the Raspberry Pi and outputs alternating high and
low values to BCM pin 18, i.e. pin 12, every 2 seconds. This can be used to make an
LED flash. The call is made by the following command:

$ python3 simple_out.py
110

Figure 9.13.: Raspberry Pi Camera Module V2[Ras19]

9.10. Raspberry Pi Camera Module V2


The Raspberry Pi camera module has a SONY IMX219 vision sensor and a resolution
of 3280 × 2464 pixels. The camera software supports JPEG, JPEG + RAW, GIF,
BMP, PNG, YUV420, RGB888 image formats and raw h.264 video formats. [Ras16]
The figure 9.14 represents the camera with the connection cable.

Specification Camera V2
Size 25mm × 23mm × 9mm
Weight 3g
Still image resolution 8 megapixels
Video Mode 1080P30,720P60 and 640x480P60/90
Sensor Sony IMX219
Sensor resolution 3280 × 2464 pixels
Sensor image area 3.68mm × 2.76mm
Pixel size 1.12umx1.12um
Optical size 1/4"
Full frame SLR lens equivalent 35mm
Focal length 3.04mm
hline Horizontal field of view 62.2◦
Vertical field of view 48.8◦
Focal length ratio 2.0

Table 9.3.: Raspberry Pi Camera Module V2 Specifications [Ras16]

9.10.1. Camera resolution


The v4l-utils tool is helpful in exploring the camera. It is used through
$ sudo apt-get install v4l-utils

installed. The available formats are then queried with the following command:
$ v4l2-ctl --list-formats-ext

9.10.2. Camera cable


The cable is very flexible as standard and has the same contacts on each side; however,
the metallic contacts are only on one side. In the figure 9.14 one cable and the side
with the contacts are shown. Since this is a very flexible and thin cable, the risk of
breakage from mechanical stress is very high. The supplied cable has a length of about
Chapter 9. 111

Figure 9.14.: Cable to the camera [Ras19]

100mm. If further distances need to be covered, cable lengths of up to 2000mm are


also available.

9.10.3. CSI-3-Interface
The camera is controlled via a CSI-3 interface, which is regularly used to integrate
digital still cameras, high-resolution and high-frame-rate sensors, teleconferencing and
camcorder functionalities on a UniPro network.
A basic CSI-3 v1.1 link configuration using four forward lanes and one reverse lane
(10 total wires) can support up to 14.88 Gbps (usable bit rate, including 8B10B and
UniPro overhead) in the forward direction and typically supports 1 Mbps or more in
the reverse direction. The UniPro stack itself uses some link bandwidth (primarily in
the reverse direction) for the purpose of guaranteeing reliable packet delivery to the
receiver. Cameras implementing a minimal MIPI CSI-3 configuration consisting of
one forward and one reverse lane (four total wires) can transmit 12 BPP 4K video at
about 40 FPS. MIPI

9.10.4. Starting up the Raspberry Pi Camera with the Jetson Nano


The Raspberry Pi camera module is supported by the Jetson Nano Plug-and-Play,
which makes commissioning easy. The device drivers are already installed with the
operating system. Once the module is connected, the camera should show up as a
device under /dev/video0. If this is not the case, the plug connections must be
checked. Especially during transport, the plug-in connection on the top of the board
can come loose.
For the next steps the programme GStreamer is helpful [GSt16]. The programme
can be called as well as a programme, but it can also be integrated into your own
programme. It enables the creation of single images and image sequences. GStreamer
can be installed with the following commands:
apt-get install libgstreamer1.0-0 gstreamer1.0-plugins-base \
gstreamer1.0-plugins-good gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc \
gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa \
gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 \
gstreamer1.0-pulseaudio
Mit der Eingabe
$ gst-launch-1.0 nvarguscamerasrc sensor_id=0 ! nvoverlaysink
the camera can be tested. The command creates a camera image in full screen mode.
To end the display, ctrl + C must be pressed.
Other settings can also be made to the camera image. To request the GStreamer
programme to open the live image of the camera with a width of 3820 pixels and a
112

height of 2464 pixels at 21 frames per second and to display it in a window with a
width of 960 pixels and a height of 616 pixels, the following line is entered:
$ gst-launch-1.0 nvarguscamerasrc sensor_mode=0 !\
’video/x-raw(memory:NVMM),width=3820, height=2464,\
framerate=21/1, format=NV12’ ! nvvidconv flip-method=0 ! \
’video/x-raw,width=960, height=616’ ! nvvidconv ! \
nvegltransform ! nveglglessink -e
When the camera is attached to the bracket on the Jetson Nano, the image may
initially be upside down. If you set flip-method=2, the image will be rotated by
180◦ . For more options to rotate and flip the image, see table ??.

none 0 no rotation
clockwise 1 Rotate clockwise by 90◦
rotate-180 2 Rotate by 180◦
counterclockwise 3 Rotate counterclockwise by 90◦
horizontal-flip 4 Flip horizontally
vertical-flip 5 Flip vertical
upper-left-diagonal 6 flip over upper-left/lower-right diagonal
upper-right-diagonal 7 Flip over upper right/lower left diagonal
automatic 8 flip method based on image-orientation tag

Table 9.4.: Operations to rotate and mirror the image in GStreamer

The documentation for GSteamer can be found at [GSt16] and in a GitHub project.
From the GitHub project, the documentation can be accessed via
git clone https://gitlab.freedesktop.org/gstreamer/gst-docs

can be installed. Further plug-ins are also documented there.


10. Camera Calibration using OpenCV
The camera is being used as a visual sensor in this application. It is essential to know
the parameters of the camera to use it effectively as a visual sensor. The process of
estimating the parameters of the camera is called camera calibration.When a camera
looks at 3D objects in the real world and transforms them into a 2D image, the
transformation isn’t perfect.Sometimes, the images are distorted: edges bent,sort of
rounded or stretched outward. It is necessary to correct this distortion in some use
cases where the information from the image is crucial.

10.1. Calibration using Checkerboard


In order to reduce the distortion, this distortion can be captured by five numbers
called Distortion Coefficients, whose values reflect the amount of radial and tangential
distortion in an image.If we know the values of all the coefficients, we can use them
to calibrate our camera and undistort the distorted images. The camera calibration
as done using chessboard patterns.A chessboard is great for calibration because it’s
regular, high contrast pattern makes it easy to detect automatically. Tthe corners of
squares on the checkerboard are ideal for localizing them because they have sharp
gradients in two directions In addition, these corners are also related by the fact that
they are at the intersection of checkerboard lines.
For the 3D points we photograph a checkerboard pattern with known dimensions at
many different orientations. The world coordinate is attached to the checkerboard
and since all the corner points lie on a plane, we can arbitrarily choose Z_w for every
point to be 0. Since points are equally spaced in the checkerboard, the (X_w, Y_w)
coordinates of each 3D point are easily defined by taking one point as reference (0, 0)
and defining remaining with respect to that reference point.

10.2. Method to Calibrate


The below are the steps used to calibrate the camera and remove the distortion in the
images in brief:

• Step1: Finding the corners of the chessboard Automatic detection of the


corners was done using the OpenCV functions findChessboardCorners() and
drawChessboardCorners()

• Step2: Calibrating the camera In order to Calibrate the camera, the first
step will be to read in calibration Images of a chess board. It’s recommended
to use at least 20 images to get a reliable calibration [SM20] To calibrate
the camera opencv gives calibrateCamera()function. retval, cameraMa-
trix, distCoeffs, rvecs, tvecs = cv2.calibrateCamera(objectPoints,
imagePoints, imageSize) In calibrateCamera() function we need object
points and image points. The coordinates of the corners in the 2D displayed
image which called as imagepoints are mapped to the 3D coordinates of the
real, undistorted chessboard corners, which are called as objectpoints.The z
coordinates will stay zero so leave that as it is but, for the first two columns
x and y, Numpy’s mgrid function is used to generate the coordinates that we
want. mgrid returns the coordinate values for given grid size and shape those

113
114

coordinates back into two columns, one for x and one for y. The following are
returned by this function:

– cameraMatrix : Camera Matrix, which helps to transform 3D objects points


to 2D image points.

– Distortion coefficients

– It also returns the position of the camera in the world, with the values of
rotation and translation vectors rvecs, tvecs

• Step3: Undistort images The undistort function takes in a distorted image,


our camera matrix, and distortion coefficients and it returns an undistorted,
often called destination image. OpenCV has a function called cv2.undistort()
which takes a camera matrix, distortion coefficients and the image. dst =
cv2.undistort(img, camera_matrix, dist_coefs, None, newcameramtx)

Note: More details about the concept of distortion in cameras,a step-by-step procedure
to calibrate the camera matrix, distortion coefficients and remove distortion and
the corresponding python programs was explained in the file Distortion_Camera_-
Calibration_OpenCV.tex in the directory “JetsonNano_Inhalt_Software”.

10.2.1. Evaluation of Results


The below are the camera matrix and distortion coefficients obtained when calculated
for the logitech webcam:

Figure 10.1.: Camera matrix calculated by the cameracalibrate function for the logitech
webcam

Figure 10.2.: Distortion coefficients returned

The below is the image of before and after distortion removal by the python program:
Chapter 10. Camera Calibration using OpenCV 115

Figure 10.3.: Image of the chessboard before and after removing distortion

There is no noticeable difference between the original image before and after the
distortion removal.However, it is important to remove the distortion in the image as
we need to calculate the real-world coordinates of the object’s centroid from the pixel
co-ordinates.
11. Distortion and Camera Calibration
The process of estimating the parameters of the camera is called camera calibra-
tion.When a camera looks at 3D objects in the real world and transforms them into a
2D image, the transformation isn’t perfect.Sometimes, the images are distorted: edges
bent,sort of rounded or stretched outward. It is necessary to correct this distortion in
some use cases where the information from the image is crucial.

11.1. Why Distortion?


When a camera looking at an object, it is looking at the world similar to how our
eyes do. By focusing the light that’s reflected off of objects in the world. In the below
picture, though a small pinhole, the camera focuses the light that’s reflected off to
a 3D traffic sign and forms a 2D image at the back of the camera. In math, the
Transformation from 3D object points, P of X, Y and Z to X and Y is done by a
transformative matrix called the camera matrix(C),which we can use to calibrate the
camera. However, real cameras don’t use tiny pinholes; they use lenses to focus on
multiple light rays at a time which allows them to quickly form images. But, lenses
can introduce distortion too.Light rays often bend a little too much at the edges of
a curved lens of a camera, and this creates the effect that distorts the edges of the
images.

Figure 11.1.: Working of a camera [Kum19]

11.1.1. Types of Distortion


Radial Distortion: Radial Distortion is the most common type that affects the
images, In which when a camera captured pictures of straight lines appeared slightly
curved or bent

Figure 11.2.: Example of a Radial Distortion [Kum19]

117
118

Tangential distortion: Tangential distortion occurs mainly because the lens is not
parallely aligned to the imaging plane, that makes the image to be extended a little
while longer or tilted, it makes the objects appear farther away or even closer than
they actually are.

Figure 11.3.: Example of a Tangential Distortion [Kum19]

So, In order to reduce the distortion, this distortion can be captured by five numbers
called Distortion Coefficients, whose values reflect the amount of radial and tangential
distortion in an image. If we know the values of all the coefficients, we can use them
to calibrate our camera and undistort the distorted images.
This means we have all the information (parameters or coefficients) about the camera
required to determine an accurate relationship between a 3D point in the real world
and its corresponding 2D projection (pixel) in the image captured by that calibrated
camera.
Typically this means recovering two kinds of parameters

• Intrinsic Parameters:Internal parameters of the camera/lens system. E.g.


focal length, optical center, and radial distortion coefficients of the lens.

• Extrinsic Paramters: This refers to the orientation (rotation and translation)


of the camera with respect to some world coordinate system.

11.2. Calibration using Checkerboard

A chessboard is great for calibration because it’s regular, high contrast pattern makes
it easy to detect automatically. Checkerboard patterns are distinct and easy to detect
in an image. Not only that, the corners of squares on the checkerboard are ideal for
localizing them because they have sharp gradients in two directions In addition, these
corners are also related by the fact that they are at the intersection of checkerboard
lines. All these facts are used to robustly locate the corners of the squares in a
checkerboard pattern.
Chapter 11. 119

Figure 11.4.: Camera Calibration flow chart [SM20]

The below image shows the difference between image of the chessboard with and
without the distortion.

Figure 11.5.: Effect of distortion on Checkerboard images [SM20]

In the process of calibration we calculate the camera parameters by a set of know 3D


points (X_w, Y_w, Z_w) and their corresponding pixel location (u,v) in the image.
For the 3D points we photograph a checkerboard pattern with known dimensions at
many different orientations. The world coordinate is attached to the checkerboard
and since all the corner points lie on a plane, we can arbitrarily choose Z_w for every
point to be 0. Since points are equally spaced in the checkerboard, the (X_w, Y_w)
coordinates of each 3D point are easily defined by taking one point as reference (0, 0)
and defining remaining with respect to that reference point.

11.3. Method to Calibrate


11.3.1. Step1: Finding the corners of the chessboard
Open CV helps to automatically detect the corners and draw on it by findChess-
boardCorners() and drawChessboardCorners()
The below image shows the sample image captured which is passed to the openCV
functions to identify and draw the corners:
120

Figure 11.6.: The Image of the chessboard passed to the opencv functions to identify
and draw corners

The resultant image is as below:

Figure 11.7.: Result after detection of corners

Below is the complete code to perform the above operation:


Chapter 11. 121

#Code from h t t p s : / / medium . com/ a n a l y t i c s −v i d h y a /camera−c a l i b r a t i o n −with −opencv−f 3 2


import numpy a s np
import cv2
import m a t p l o t l i b . p y p l o t a s p l t
import m a t p l o t l i b . image a s mpimg
# prepare o b j e c t points
nx = 8 number o f i n s i d e c o r n e r s in x
ny = 6 number o f i n s i d e c o r n e r s in y
# Make a l i s t o f c a l i b r a t i o n images
fname = ’ c a l i b r a t i o n _ t e s t . png ’
img = cv2 . imread ( fname )
# Convert t o g r a y s c a l e
gray = cv2 . c v t C o l o r ( img , cv2 .COLOR_BGR2GRAY)
# Find t h e c h e s s b o a r d c o r n e r s
r e t , c o r n e r s = cv2 . f i n d C h e s s b o a r d C o r n e r s ( gray , ( nx , ny ) , None )
# I f found , draw c o r n e r s
i f r e t == True :
# Draw and d i s p l a y t h e c o r n e r s
cv2 . drawChessboardCorners ( img , ( nx , ny ) , c o r n e r s , r e t )
p l t . imshow ( img )

Listing 11.1.: Script to find and mark the corners on a chessboard image

11.3.2. Step2: Calibrating the camera


In order to Calibrate the camera, the first step will be to read in calibration Images of
a chess board. It’s recommended to use at least 20 images to get a reliable calibration
[SM20]
To calibrate the camera opencv gives calibrateCamera()function.
retval, cameraMatrix, distCoeffs, rvecs, tvecs = cv2.calibrateCamera(objectPoints,
imagePoints, imageSize)
In calibrateCamera() function we need object points and image points. The co-
ordinates of the corners in the 2D displayed image which called as imagepoints are
mapped to the 3D coordinates of the real, undistorted chessboard corners, which are
called as objectpoints.
The z coordinates will stay zero so leave that as it is but, for the first two columns
x and y, Numpy’s mgrid function is used to generate the coordinates that we want.
mgrid returns the coordinate values for given grid size and shape those coordinates
back into two columns, one for x and one for y
The following are returned by this function:

• cameraMatrix : Camera Matrix, which helps to transform 3D objects points to


2D image points.

• Distortion coefficients

• It also returns the position of the camera in the world, with the values of rotation
and translation vectors rvecs, tvecs

The below are the pictures of the camera matrix and distortion coefficients calculated
by the function.
122

Figure 11.8.: Camera matrix calculated by the cameracalibrate function for the logitech
webcam

Figure 11.9.: Distortion coefficients returned

11.3.3. Step3: Undistort images

The undistort function takes in a distorted image, our camera matrix, and distortion
coefficients and it returns an undistorted, often called destination image. OpenCV
has a function called cv2.undistort() which takes a camera matrix, distortion
coefficients and the image.
dst = cv2.undistort(img, camera_matrix, dist_coefs, None, newcameramtx)

Figure 11.10.: Image of the chessboard before and after removing distortion

The below is the script to calibrate and undistort an image.


Chapter 11. 123

#! / u s r / b i n / env python

’’’
camera c a l i b r a t i o n f o r d i s t o r t e d images w i t h c h e s s board s a m p l e s
r e a d s d i s t o r t e d images , c a l c u l a t e s t h e c a l i b r a t i o n and w r i t e u n d i s t o r t e d images
usage :
c a l i b r a t e . py [−−o u t <o u t p u t path >] [−− s q u a r e _ s i z e ] [< image mask >]
default values :
−−o u t : . sample / o u t p u t /
−−s q u a r e _ s i z e : 1 . 0
<image mask> d e f a u l t s t o . o u t / c h e s s b o a r d / ∗ . j p g

Code f o r k e d from OpenCV :


h t t p s : / / g i t h u b . com/ opencv / opencv / b l o b / a 8 e 2 9 2 2 4 6 7 b b 5 b b 8 8 0 9 f c 8 a e 6 b 7 2 3 3 8 9 3 c 6 d b 9 1 7 / s a
r e l e a s e d under BSD 3 l i c e n s e
’’’

# Python 2/3 c o m p a t i b i l i t y
from __future__ import p r i n t _ f u n c t i o n
# l o c a l modules
from common import s p l i t f n
# b u i l t −i n modules
import o s
import s y s
from g l o b import g l o b
import numpy a s np
import cv2
import l o g g i n g
import a r g p a r s e

l o g g i n g . b a s i c C o n f i g ( format= ’ %( l e v e l n a m e ) s :%( message ) s ’ , l e v e l=l o g g i n g .DEBUG)

i f __name__ == ’__main__ ’ :

# Parse arguments
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n= ’ Generate ␣ camera ␣ matrix ␣and␣ ’
’ d i s t o r t i o n ␣ p a r a m e t e r s ␣ from ␣ c h e c k e r b o a r d ␣ ’
’ images ’ )
p a r s e r . add_argument ( ’ images ’ , help= ’ path ␣ t o ␣ images ’ )
p a r s e r . add_argument ( ’ pattern_x ’ , metavar= ’X ’ , d e f a u l t =9, type=int ,
help= ’ p a t t e r n ␣x ’ )
p a r s e r . add_argument ( ’ pattern_y ’ , metavar= ’Y ’ , d e f a u l t =6, type=int ,
help= ’ p a t t e r n ␣y ’ )
p a r s e r . add_argument ( ’−−out ’ , help= ’ o p t i o n a l ␣ path ␣ f o r ␣ output ’ )
p a r s e r . add_argument ( ’−−s q u a r e _ s i z e ’ , d e f a u l t =1.0)
a r g s=p a r s e r . p a r s e _ a r g s ( )

l o g g i n g . debug ( a r g s )

# g e t images i n t o a l i s t
e x t e n s i o n s = [ ’ j p g ’ , ’JPG ’ , ’ png ’ ]

i f o s . path . i s d i r ( a r g s . images ) :
img_names = [ f n f o r f n in o s . l i s t d i r ( a r g s . images )

Listing 11.2.: Script to calibrate and undistort an image - part 1


124

i f any ( f n . e n d s w i t h ( e x t ) f o r e x t in e x t e n s i o n s ) ]
p r o j _ r o o t = a r g s . images
else :
l o g g i n g . e r r o r ( "%s ␣ i s ␣ not ␣ a ␣ d i r e c t o r y " % a r g s . images )
exit ()

i f not a r g s . out :
out = o s . path . j o i n ( p r o j _ r o o t , ’ out ’ )
else :
out = a r g s . out

i f not o s . path . i s d i r ( out ) :


o s . mkdir ( out )

square_size = float ( args . square_size )


p a t t e r n _ s i z e = ( a r g s . pattern_x −1, a r g s . pattern_y − 1 ) # For some r e a s o n you
p a t t e r n _ p o i n t s = np . z e r o s ( ( np . prod ( p a t t e r n _ s i z e ) , 3 ) , np . f l o a t 3 2 )
p a t t e r n _ p o i n t s [ : , : 2 ] = np . i n d i c e s ( p a t t e r n _ s i z e ) . T . r e s h a p e ( −1 , 2 )
p a t t e r n _ p o i n t s ∗= s q u a r e _ s i z e
obj_points = [ ]
img_points = [ ]
h, w = 0, 0
img_names_undistort = [ ]
print ( ’ img : ␣ ’ , img_names )
#f o r l o o p t o read image names and t h e images from t h e f o l d e r
f o r f n in img_names :
print ( ’ p r o c e s s i n g ␣%s . . . ␣ ’ % fn , end= ’ ’ )
img = cv2 . imread ( o s . path . j o i n ( p r o j _ r o o t , f n ) , 0 )
i f img i s None :
print ( " F a i l e d ␣ t o ␣ l o a d " , f n )
continue
h , w = img . shape [ : 2 ]
found , c o r n e r s = cv2 . f i n d C h e s s b o a r d C o r n e r s ( img , p a t t e r n _ s i z e )
#t h e a b ov e f u n c t i o n f i n d s a l l t h e i n t e r s e c t i o n p o i n t s on t h e c h e s s board
i f found :
term = ( cv2 .TERM_CRITERIA_EPS + cv2 .TERM_CRITERIA_COUNT, 3 0 , 0 . 1 )
#cv2 .TERM_CRITERIA_EPS − s t o p t h e a l g o r i t h m i t e r a t i o n i f s p e c i f i e d ac
cv2 . c o r n e r S u b P i x ( img , c o r n e r s , ( 5 , 5 ) , ( −1 , −1) , term )
#cv2 . c o r n e r S u b P i x ( image , c o r n e r s , w i n S i z e , zeroZone , c r i t e r i a )
#This f u n c t i o n u s e s t h e d o t p r o d u c t t r i c k and i t e r a t i v e l y r e f i n e s t h e
i f out :
v i s = cv2 . c v t C o l o r ( img , cv2 .COLOR_GRAY2BGR)
#cv2 . c v t C o l o r ( ) method i s used t o c o n v e r t an image from one c o l o r s p a
cv2 . drawChessboardCorners ( v i s , p a t t e r n _ s i z e , c o r n e r s , found )
#c o n n e c t s a l l t h e i n t e r s e c t i o n p o i n t s on t h e c h e s s b o a r d
path , name , e x t = s p l i t f n ( f n )
o u t f i l e = o s . path . j o i n ( out , name + ’ _chess . png ’ )
cv2 . i m w r i t e ( o u t f i l e , v i s )
i f found :
img_names_undistort . append ( o u t f i l e )

i f not found :
print ( ’ c h e s s b o a r d ␣ not ␣ found ’ )
continue

Listing 11.3.: Script to calibrate and undistort an image - part 2


Chapter 11. 125

img_points . append ( c o r n e r s . r e s h a p e ( −1 , 2 ) )
o b j _ p o i n t s . append ( p a t t e r n _ p o i n t s )

print ( ’ ok ’ )

# c a l c u l a t e camera d i s t o r t i o n
rms , camera_matrix , d i s t _ c o e f s , r v e c s , t v e c s = cv2 . c a l i b r a t e C a m e r a ( o b j _ p o i n t s

print ( " \nRMS: " , rms )


print ( " camera ␣ matrix : \ n " , camera_matrix )
# p r i n t ( " m a t r i x : \n " , t y p e ( camera_matrix ) )
print ( " d i s t o r t i o n ␣ c o e f f i c i e n t s : ␣ " , d i s t _ c o e f s . r a v e l ( ) )

# w r i t e t o m a t r i x t o be used as i n p u t
with open ( o s . path . j o i n ( out , " matrix . t x t " ) , "w" ) a s matf :
camera_matrix . r e s h a p e ( ( 3 , 3 ) )
np . s a v e t x t ( matf , ( camera_matrix [ 0 ] , camera_matrix [ 1 ] , camera_matrix [ 2 ] ) ,

with open ( o s . path . j o i n ( out , " d i s t o r t i o n . t x t " ) , "w" ) a s d i s t f :


np . s a v e t x t ( d i s t f , d i s t _ c o e f s . r a v e l ( ) , fmt= ’ %.12 f ’ )

# u n d i s t o r t t h e image w i t h t h e c a l i b r a t i o n
f o r img_found in img_names_undistort :
img = cv2 . imread ( img_found )

h , w = img . shape [ : 2 ]
newcameramtx , r o i = cv2 . getOptimalNewCameraMatrix ( camera_matrix , d i s t _ c o e

d s t = cv2 . u n d i s t o r t ( img , camera_matrix , d i s t _ c o e f s , None , newcameramtx )

# c rop and s a v e t h e image


x , y , w, h = r o i
d s t = d s t [ y : y+h , x : x+w ]
path , name , e x t = s p l i t f n ( img_found )
o u t f i l e = o s . path . j o i n ( path , name + ’ _ u n d i s t o r t e d . png ’ )
cv2 . i m w r i t e ( o u t f i l e , d s t )
print ( ’ U n d i s t o r t e d ␣ image ␣ w r i t t e n ␣ t o : ␣%s ’ % o u t f i l e )

cv2 . destroyAllWindows ( )

Listing 11.4.: Script to calibrate and undistort an image - part 3

11.3.4. Use
Light? WS:Light? Distance?
Power? How to change?
12. Arducam IMX477
Für Jetson IMX477 HQ Camera Board, 12.3MP Camera Board für Nvidia Jetson
Nano/Xavier NX, Raspberry Pi Compute Module
Farbe: IMX477 Kamera Board für Jetsons
Ein Can’t-Miss Update auf IMX219 – Viel bessere Bildqualität und Flexibilität der
Linse, dank 53% mehr Pixel, 92 % größere Pixelgröße, 192 % größerer Bildbereich und
einer CS-Mount. Maximale Standbildauflösung bei 4056 × 3040, Bildfrequenzen: 30
fps Full 12,3 MP, 60 fps 1080P.
Beschleunigt durch Jetson Hardware ISP Engine – unterstützt NVIDIA Argus Kamer-
a-Plugin für H264-Kodierung, JPEG-Snapshots sowie Gain, Belichtung, Framerate
und Gruppe, Hold-Kontrolle.
4-Lane Camera for the Future - Route out all 4 data spurs to the camera connector
for customized Carrier boards with 4-spane CSI-Anschluss and future 4-spane camera
driver updates. For the new version (R32 4.4), please go to the link:
https://github.com/ArduCAM/MIPI_Camera/releases
Hinweis:

• C/CS-Objektiv ist nicht im Lieferumfang enthalten.


• Raspberry Pi 2/3/4/Zero funktionieren nicht damit. Nur die Computer-Module
tun.
• Der Kameratreiber muss installiert werden:
https://github.com/ArduCAM/MIPI_Camera/Releases/Tag/v0.0.2;
Für die neue Version (R32 4.4) gehen Sie bitte zu dem Link: https://github.
com/ArduCAM/MIPI_Camera/releases

• Es kann auch auf dem Raspberry Pi Compute Modul verwendet werden. (wie
CM3, CM3+), aber nicht kompatibel mit Standard Raspberry Pi Modellen

International products have separate terms, are sold from abroad and may differ
from local products, including fit, age ratings, and language of product, labeling or
instructions

Figure 12.1.: Kamera IMX477 der Firma Arducam; [Ard21]

127
128

Arducam 12,3 MP IMX477 HQ Kamera-Modul mit 6 mm CS Objektiv und Tipod-Hal-


terung für Nvidia Jetson Nano/Xavier NX und Raspberry Pi Computermodul
https://www.amazon.de/Arducam-Kamera-Modul-Tipod-Halterung-Raspberry-Computermodul/
dp/B08SC2QY1B/ref=pd_sbs_3/261-1564509-7527066?pd_rd_w=Bodcr&pf_rd_p=
a0a2bb41-2b9d-47ea-9dff-8a3ade3a13d6&pf_rd_r=2W868QDFVHRJCGGFVZ9Q&pd_
rd_r=0f94f9d1-3f4c-47c5-a17f-4dc798a15aaa&pd_rd_wg=TBqO1&pd_rd_i=B08SC2QY1B&
psc=1

12.1. What are the Jetson Native Cameras


Native Jetson CameraThe native cameras are the camera modules that officially
supported by the Jetson hardware with the driver and ISP support. At the moment,
two camera modules IMX219 and IMX477 are natively supported, they can cover most
of the use-cases.

12.2. What is the Advantage on Jetson Native Cameras


If you are new to Jetson and want to use the camera with the existing on-line opensource
code that uses native camera commands like nvgstcapture/nvarguscamera, or nvargus
API, etc.
https://www.arducam.com/docs/camera-for-jetson-nano/native-jetson-cameras-imx219-imx477/
imx477/
https://www.arducam.com/docs/camera-for-jetson-nano/native-jetson-cameras-imx219-imx477/
imx477-ir-cut-camera/

• https://github.com/JetsonHacksNano/CSI-Camera/blob/master/simple_
camera.py

• https://www.arducam.com/docs/camera-for-jetson-nano/native-jetson-cameras-imx219-imx4
imx477-troubleshoot/

• https://www.arducam.com/sony/imx477/
• https://www.uctronics.com/camera-modules/camera-for-raspberry-pi/
high-quality-camera-raspberry-pi-12mp-imx477.html

• https://www.arducam.com/sony/imx477/
Chapter 12. Arducam IMX477 129

# MIT L i c e n s e
# C o p y r i g h t ( c ) 2019 J e t s o n H a c k s
# See l i c e n s e
# Using a CSI camera ( such as t h e Raspbe rry Pi Version 2) c o n n e c t e d t o a
# NVIDIA J e t s o n Nano D e v e l o p e r Kit u s i n g OpenCV
# D r i v e r s f o r t h e camera and OpenCV a r e i n c l u d e d i n t h e b a s e image

import cv2

# g s t r e a m e r _ p i p e l i n e r e t u r n s a GStreamer p i p e l i n e f o r c a p t u r i n g from t h e CSI came


# D e f a u l t s t o 1280 x720 @ 60 f p s
# F l i p t h e image by s e t t i n g t h e f l i p _ m e t h o d ( most common v a l u e s : 0 and 2)
# d i s p l a y _ w i d t h and d i s p l a y _ h e i g h t d e t e r m i n e t h e s i z e o f t h e window on t h e s c r e e n

def g s t r e a m e r _ p i p e l i n e (
capture_width =1280 ,
c a p t u r e _ h e i g h t =720 ,
d i s p l a y _ w i d t h =1280 ,
d i s p l a y _ h e i g h t =720 ,
f r a m e r a t e =60 ,
flip_method =0,
):
return (
" nvarguscamerasrc ␣ ! ␣ "
" v i d e o /x−raw ( memory :NVMM) , ␣ "
" width=( i n t )%d , ␣ h e i g h t =( i n t )%d , ␣ "
" format =( s t r i n g )NV12 , ␣ f r a m e r a t e =( f r a c t i o n )%d/1 ␣ ! ␣ "
" nvvidconv ␣ f l i p −method=%d␣ ! ␣ "
" v i d e o /x−raw , ␣ width=( i n t )%d , ␣ h e i g h t =( i n t )%d , ␣ format =( s t r i n g )BGRx␣ ! ␣ "
" videoconvert ␣ ! ␣"
" v i d e o /x−raw , ␣ format =( s t r i n g )BGR␣ ! ␣ a p p s i n k "
% (
capture_width ,
capture_height ,
framerate ,
flip_method ,
display_width ,
display_height ,
)
)

def show_camera ( ) :
# To f l i p t h e image , modify t h e f l i p _ m e t h o d parameter (0 and 2 a r e t h e most c
print ( g s t r e a m e r _ p i p e l i n e ( flip_method =0))
cap = cv2 . VideoCapture ( g s t r e a m e r _ p i p e l i n e ( flip_method =0) , cv2 .CAP_GSTREAMER)
i f cap . isOpened ( ) :
window_handle = cv2 . namedWindow ( " CSI␣Camera " , cv2 .WINDOW_AUTOSIZE)
# Window
while cv2 . getWindowProperty ( " CSI␣Camera " , 0 ) >= 0 :
r e t _ v a l , img = cap . r e a d ( )
cv2 . imshow ( " CSI␣Camera " , img )
# This a l s o a c t s as
keyCode = cv2 . waitKey ( 3 0 ) & 0xFF
# Stop t h e program on t h e ESC key
i f keyCode == 2 7 :
break
cap . r e l e a s e ( )
cv2 . destroyAllWindows ( )
else :
print ( " Unable ␣ t o ␣ open ␣ camera " )
13. Lens Calibration Tool
Arducam Lens Calibration Tool, Sichtfeld (Field of View, FoV) Test Chart Folding
Card, Pack of 2
https://www.amazon.com/-/de/dp/B0872Q1RLD/ref=sr_1_40?__mk_de_DE=ÅMÅŽ~O~N&
dchild=1&keywords=arducam&qid=1622358908&sr=8-40

Multifunktional für Objektiv: Objektivfokus kalibrieren, FoV messen und Schärfe


abschätzen
Einfach zu bedienen: Schnelle Einrichtung in Sekunden und Messung in Minuten mit
Online-Videotutorial.
Ein handliches Werkzeug: Einfaches Ermitteln des Sichtfelds des Objektivs ohne
Berechnung
Sauber und aufgeräumt: Faltbarer Kartenstil, mehrfach gefaltet und aufgerollt für
schnelle Einstellung und bessere Lagerung
Anwendung: Fokuskalibrierung, Schärfeschätzung und Bildfeld-Schnellmessung für
M12, CS-Mount, C-Mount und DSLR-Objektive.
https://www.arducam.com/product/arducam-lens-calibration-tool-field-of-view-fov-test-char

13.1. Übersicht
Sind Sie immer noch frustriert von der Berechnung des FOV Ihrer Objektive und
den unscharfen Bildern? Arducam hat jetzt ein multifunktionales Tool für Objektive
veröffentlicht, mit dem Sie das Sichtfeld des Objektivs ohne Berechnung erhalten und
den Objektivfokus schnell und einfach kalibrieren können.

13.2. Applications
Focus calibration, sharpness estimation and field of view quick measuring for M12,
CS-Mount, C-Mount, and DSLR lenses

Figure 13.1.: Kalibrierungswerkzeug der Firma Arducam; [Ard21]

131
132

13.3. Package Contents


2*Foldable Lens Calibration Card
14. Jetson Nano MSR-Lab
The MSR lab’s Jetson Nano is housed in a metal case with a power and reset button
embedded and the camera attached to the bracket provided (Figure 14.1).

Figure 14.1.: Jetson Nano MSR-Lab in case with Pi camera, front view

On the side of the Jetson Nano (picture 14.2), the required connections - USB, HDMI,
Ethernet, Micro USB and the microSD card slot - are exposed.

Figure 14.2.: Connections of the Jetson Nano MSR-Lab in the case

At the back of the Jetson Nano (Figure 14.3) is the J41 header with the GPIO pins.

133
134

Figure 14.3.: J41 header of the Jetson Nano MSR-Lab in the case, rear view

14.1. GPIOs
The pinout of the Jetson Nano and the instructions for installing the required libraries
can be found in chapter ??.

14.1.1. Control via the terminal


The GPIOs can be controlled directly via the terminal. This is especially helpful for
testing the functionality. For example, a circuit can be built with an LED that is
controlled via board pin 12. A possible circuit is shown in the chapter 14.1.3.
To get direct access to the GPIOs, you have to be a ’super user’.
$ sudo su
Now you can view the current configuration of the pins:
$ cat /sys/kernel/debug/gpio
To define pin 12 of the Jetson Nano as an output, tap
$ echo 79 > /sys/class/gpio/export
$ echo out > /sys/class/gpio/gpio79/direction
Now the output of the pin can be set to high (3.3 V) or low (0 V):
$ echo 1 > /sys/class/gpio/gpio79/value
$ echo 0 > /sys/class/gpio/gpio79/value
Finally, the pin should be released again:
$ echo 79 > /sys/class/gpio/unexport

14.1.2. Verwendung der GPIO-Pins im Python-Programm


If the GPIO pins are to be controlled from a Python programme, the following
commands are elementary.
First, the library must be imported (before this, the steps described in the chapter ??
for installing these libraries must be carried out). In order to be able to access the
module under ’GPIO’, the corresponding specification ’as GPIO’ follows
Chapter 14. Jetson Nano MSR-Lab 135

import Jetson.GPIO as GPIO

The Jetson GPIO library offers four options for numbering the I/O pins, so the
numbering to be used must be specified.

GPIO.setmode(GPIO.BOARD)
oder
GPIO.setmode(GPIO.BCM)
oder
GPIO.setmode(GPIO.CVM)
oder
GPIO.setmode(GPIO.TEGRA_SOC)

To set up a GPIO channel as an input or output, the commands

GPIO.setup(channel, GPIO.IN)
or
GPIO.setup(channel, GPIO.OUT)

where under channel the pin is specified in the given numbering. The state of the
pin prepared in this way can now be checked via

GPIO.output(channel, GPIO.HIGH)
and
GPIO.output(channel, GPIO.LOW)

be changed.

14.1.3. Visualisierung mit LED


To visualise the circuitry of the GPIO pins, LEDs can be connected to the Jetson
Nano. A possible circuit for this can be seen in figure ??. It is explained in more
detail under [Jet19]. On the left side you can see the pins to which the elements are
connected. Here, pin 2 has 5V, pin 6 is GND and pin 12 is switched to turn the LED
on and off.

330Ω

LED

10kΩ
12 BC337

Figure 14.4.: Circuit for testing the GPIO pins with LED

A photo of the circuit can be seen in figure ??. The cable on the left in this photo is
connected to pin 2, the one in the middle to pin 6 and the one on the right to pin 12.
136

Figure 14.5.: Circuit for testing the GPIO pins with LED
15. “Hello World” - Image Recognition
This chapter shows how the Jetson Nano can be used for image recognition. However,
no own network is trained, but a trained network is used. An example project from
NVIDIA is used for this purpose. In the example, an attempt is made to recognise
bottles. The result is then a probability; a value of 70% is called good.

15.1. Loading the Example Project from GitHub


NVIDIA provides a sample project via GitHub, which is loaded and configured as
follows.
$ git clone https://github.com/dusty-nv/jetson-inference
Afterwards, a directory jetson-inference is available that contains all files. Now change
to this directory:
$ cd jetson-inference
Now the submodule must be initialised.
$ git submodule update -init
However, the programme is not available as an executable file, but must still be created.
To do this, a directory for the executable file is first created:
$ mkdir build
After changing to the build directory, the programme is created:
$ cd build
$ cmake ../
$ make
$ sudo make install
Patience is needed here, as the process takes a while. The executable files are then
located in the directory jetson-inference/build/aarch64/bin.

15.2. Example Applications


The following sample applications are all in the jetson-inference project. Either a
single image or an image from a live stream is classified. Each application then delivers
an image file with the surrounding rectangles of the detected objects as a result. In
addition, the coordinates of the surrounding rectangles and the probability are listed.
A hit probability of over 70% is usually considered good. In other words, these are
not production quality models. Since the Jetson Nano is used for inference, a trained
model must also be passed. The first time the applications are started, the model
is optimised; this process can take a few minutes. The next time the application is
called, the optimised model is used, so the process is much faster. Several pre-trained
models are available; they are all in the project. The table 15.1 lists the names of the
models, their name as a passing argument and their trained classes.

In order to be able to call up the applications, the directory is first changed:


$ cd ˜/jetson-inference/build/aarch64/bin
Example files are provided in the images/ directory.

137
138
Model Argument Classes
Detect-COCO-Airplane coco-airplain airplane
Detect-COCO-Bottle coco-bottle bottles
Detect-COCO-Chair coco-chair chairs
Detect-COCO-Dog coco-dogs dogs
ped-100 pednet pedestrian
multiped-500 multiped pedestrians & luggage
facenet-120 facenet faces

Table 15.1.: Trained models in the projet jetson-inference


WS:TensorRT
WS:was
WS:Zitieren einführen
ist coco-dog
von COCO,
und
m "Common Objects in
Co.? 15.2.1. Launch of the First Machine Learning Model
Context"-Datensatz
The first application that can be tested is detectnet-console. It expects an image
as input, a target file and a trained net. The file including the surrounding rectangles
is stored in the target file. In addition, a list with the coordinates of the recognised
surrounding rectangles is output as a result. A probability for the result is also
provided.
In this example, a file with the name dog.jpg is examined. The result is written into
the file out.jpg. This means that care must be taken that the file out.jpg is overwritten.
Detect-COCO-Dog is used as the trained model.
$ ./detectnet-console ˜/dog.jpg out.jpg coco-dog
A result could be reported as follows:
1 Bounding Box erkannt
erkanntes Objekt 0 Klasse #0 (Hund) confidence=0.710427
Begrenzungskasten 0 (334.687500, 488.742188) (1868.737549, 2298.121826)
w=1534.050049 h=1809.379639
Here, the probability is given as 71.0%.

WS:Insert concrete
example here
15.2.2. Using the Console Programme imagenet-console.py
The Python programme imagenet-console.py can also be used. The programme
performs an inference for an image with the model imageNet. The programme also
expects three transfer parameters. The first parameter is the image to classify. The
second parameter is a file name; in this file the image is stored overlaid with the
bounding boxes. The argument is the classification model.
The call to the application could look like this, for example:
$ ./imagenet-console.py --network=googlenet images/orange_0.jpg output_-
WS:What is googlenet? 0.jpg

15.2.3. Live Camera Image Recognition with imagenet-camera.py


Another example is the evaluation of a live image using the application imagenet-
camera.py. It generates a live camera stream with OpenGL rendering and accepts
four optional command line arguments:
--network: Selection of the classification model. If no specification is made,
GoogleNet is used; this corresponds to --network=googlenet.
--camera: Specification of the camera device used. MIPI-CSI cameras are
identified by specifying the sensor index (0, 1, . . . ). The default is to use the
MIPI-CSI sensor 0, which is identical to the argument --camera=0.
Chapter 15. 139

--width: Setting the width of the camera resolution. The default value is 1280.

--height: Setting the width of the camera resolution. The default value is 720.
The resolution should be set to a format that the camera supports. WS:Beispiel fehlt

--help: Help text and further command line parameters

The use of these arguments can be combined as needed. To run the programme with
the GoogleNet network using the Pi camera with a default resolution of 1280 × 720,
entering this command is sufficient:
$ ./imagenet-camera.py

The OpenGL window displays the live camera stream, the name of the classified
object and the confidence level of the classified object along with the frame rate of the
network. The Jetson Nano provides up to 75 frames per second for the GoogleNet
and ResNet-18 models. WS:What is ResNEt-18
Since each individual image is analysed in this programme and the result is output,
rapid changes between the identified object classes may occur in the case of ambiguous
assignments. If this is not desired, modifications must be made.

15.3. Creating your Own Programme


To create your own Python programme, you must first set up the new project:
$ cd ˜/ WS:my-recognition
$ mkdir MyJetsonProject ändern
$ cd MyJetsonProject
$ touch MyJetsonProject.py
$ chmod +x MyJetsonProject.py

In this case, a file MyJetsonProject.py is created in the directory MyJetsonProject.


This file is still empty. To create the programme, it can now be opened and edited in
an editor.
In the Linux world, bash scripting is often used. This includes the use of a shebang
sequence; this is a sequence of characters written in the first line of a file. It starts with
the two characters #!; this is followed by the path to the bash interpreter, which is
supposed to parse and interpret the rest of the file. In this case, the Python interpreter
is entered:

#!/usr/bin/python

import jetson. inference


import jetson. utils

import argparse

After the shebang sequence, the necessary import statements are added. Here they are
the modules jetson.inference and jetson.utils. These two modules are used for
loading images and image recognition. In addition, the standard package argparse is
given here. With the help of the functions of the package, the command line can be
parsed.
Part III.

Object Detection with the


Jetson Nano

141
16. Python Installation
Python is commonly used for developing websites and software, task automation, data
analysis, and data visualization. Since it’s relatively easy to learn, Python has been
adopted by many non-programmers such as accountants and Data scientists, for a
variety of everyday tasks, like organizing finances. It is an interpreted, object-oriented,
high-level programming language with dynamic semantics. Python Installation at the
time of writing this report the latest version of python is 3.9.7 release.16.1

Figure 16.1.: Python Installation

143
17. Virtual environments in Python
When creating software, it is a difficult undertaking to have a well-defined development
environment. When developing software with Python, one is supported. A virtual
environment can be set up here. It ensures, even if individual software packages are
updated on the computer, that the defined package versions are always used.
WS:Could be explained
better

17.1. package python3-venv for virtual environments


On a PC system this is possible with the conda tool. Unfortunately, it cannot be used
on a Jetson Nano. Here the package python3-venv helps, which can be installed as
follows:
sudo apt install -y python3-venv

The following command creates a folder python-envs if it does not already exist, and
a new virtual environment in /home/msr/python-envs/env with the name env:
python3 -m venv ∼/python-envs/env

After these two steps, the virtual environment can be used. To do this, it must be
activated:
source ∼/python-envs/env/bin/activate

The terminal indicates that the environment is active by displaying the environment
name (env) before the user name. To deactivate the environment, simply run the
command deactivate.
Now individual packages can be installed. In order to be able to manage the installed
packages, the package wheel should also be installed; this is needed for the installation
of many subsequent packages.
pip3 install wheel

A new virtual environment should be created for each major project. This way, specific
package versions can be installed. If updating a package causes problems, the virtual
environment can simply be deleted. This is easier than reinstalling the entire system.

17.2. Package pipenv for virtual environments


The package pipenv enables the creation of a virtual environment. It a pip file that
helps manage the package and can be easily installed or removed. After installing
pipenv, the tools pip and virtualenv can be used together to create a virtual
environment, the file pipfile works as a replacement for the file requirement.txt
which keeps track of the package version according to the given project.
If a computer with the Windows operating system is used, the tool Git Bash is to be
used as a terminal. For the macOS operating system, the tool Homebrew is available.
Linux users use the package management tool LinuxBrew.
To create a virtual environment with the package pipenv, a directory for the new
project MyProject must first be created.
mkdir MyProject

145
146

Now you have to change to this directory:


cd MyProject

In the Windows operating system the package pipenv can now be installed.
pip install pipenv

On the other hand, Linux/macOS users can use the following command to install the
WS:Why brew if pip pipenv package after installing LinuxBrew.
works?
brew install pipenv

Now a new virtual environment can be created for the project with the following
command:
pipenv shell

The command creates a new virtual environment virtualenv and a pip file Pipfile
for the project. After installing the package virtualenv, the necessary packages for
the project can now be installed in the virtual environment. In the following command
the packages requests and flask are installed.
pipenv install requests
pipenv install flask

The command reports that the file Pipfile has been changed accordingly. The
following command can be used to uninstall the package.
pipenv uninstall flask

To be able to work outside the package, the virtual environment must be disabled:
exit

The virtual environment can then be deactivated again with the command
source MyProject/bin/activate

can be activated.

17.3. Program Anaconda Prompt-It for virtual


environments
The programme Anaconda Prompt-It is a tool that is installed with the installation
of an Anaconda distribution. It allows the use of a command line to create Python
programs. The figure 17.1 shows what information you get if you call the command
conda.

With the installation of Anaconda there is an easy way to define a virtual environment.
The command
conda create -n MyProject python = 3.7
creates a new virtual environment named MyProject using the specific Python version
of 3.7. After creating the virtual environment, it can be activated:
conda activate MyProject

The environment used is specified in the command line in brackets: SHELL(MyProject)


C:\. After activation, the necessary packages can be installed. For example, the
package numpy will be installed.
conda install -n MyProject numpy
Chapter 17. 147

Figure 17.1.: help text for conda

Alternatively, the package manager inside the virtual environment can be used.
pip install numpy

For documentation purposes, all installed packages within a virtual environment can
be displayed. The following command is used for this, with the figure 17.2 showing a
possible output:
conda list

If one wants to use a different virtual environment, the current one must be deactivated:
conda deactivate

All created virtual environments can also be displayed. This facilitates their manage-
ment. The figure 17.3 shows a possible output.
conda env list

Finally, you can also remove a virtual environment. The following command removes
the virtual environment MyProject with all its packages.
conda env remove -n myenv

17.4. Program Anaconda Navigator for Virtual


Environments
If a graphical interface is preferred, the tool Anaconda Navigator-It can be used.
With it one can start and manage packages in Anaconda.
148

Figure 17.2.: list of installed packages

Figure 17.3.: list of all created virtual environments


Chapter 17. 149

Figure 17.4.: Anaconda Navigator: Calling the virtual environments area

After starting the programme, one can switch to the „Environments“ task according to
Figure 17.4, where the virtual environment base is always displayed. It is the default
setting.
In this dialogue one can press the button „Create“. Then the necessary information
must be entered to create a new virtual environment called new-env, see figure 17.5.
After that, you can display all installed packages and install new ones. To activate
the virtual environment, the corresponding environment must be selected.
150

Figure 17.5.: Anaconda Navigator: Creating a virtual environment


18. TensorFlow
18.1. Overview
18.1.1. TensorFlow Framework
TensorFlow is a machine learning system that operates at large scale and in het-
erogeneous environments. It uses dataflow graphs to represent computation, shared
state, and the operations that mutate that state. TensorFlow can train and run
deep neural networks for handwritten digit classification, image recognition, word
embeddings, recurrent neural networks, sequence-to-sequence models for machine
translation, natural language processing, and PDE (partial differential equation) based
simulations. It maps the nodes of a dataflow graph across many machines in a cluster,
and within a machine across multiple computational devices, including multicore CPUs,
general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units
(TPUs).TensorFlow enables developers to experiment with novel optimizations and
training algorithms. TensorFlow supports a variety of applications, with a focus on
training and inference on deep neural networks. Several Google services use TensorFlow
in production, we have released it as an open-source project, and it has become widely
used for machine learning research. In this paper, we describe the TensorFlow dataflow
model and demonstrate the compelling performance that Tensor- Flow achieves for
several real-world applications.TensorFlow

18.1.2. TensorFlow Use Case


Tensor flow is the framework made by google for making the Machine learning
application and training the model. Google search engine also based on Tensor
flow AI application, e.g; when a user type "a" in the google search bar, the search
engine predict the most suitable complete word to select, or it may be predicts depends
upon the previous search on the same system by the user too. It can be used across
a range of tasks but has a particular focus on training and inference of deep neural
networks. Tensor flow has a various collection of workflows to develop and train the
model using Python and Javascripts programming languages, after training the model
we can deploy the model on the Cloud or also on any device for making edge computing
application no matter which language use in the device. It is obvious that, the most
important part in every Machine learning application is to train the ML model, so
the model will apply in real time condition and take the decision without any human
intervention as the human normally make.

18.1.3. TensorFlow Lite for Microcontrollers


TensorFlow is a critical software piece of infrastructure that is needed, in order to
be able to train a machine learning model, in order to be able to teach an algorithm,
to look for interesting patterns in data, and how we go about doing that. If an edge
computer is used then the user also has to train models in the context of TinyML.
TensorFlow is big, bulky, and beefy because it was originally designed for big computing
systems. If the user want to deploy things on small, little embedded microcontrollers,
then it is necessary to make the system lean and mean. that’s why TensorFlow Lite
was developed.

151
152

TensorFlow Lite, short “TFLite”, is a set of tools for converting and optimizing
TensorFlow models for use on mobile and edge devices.TensorFlow Lite, as an Edge
AI implementation, significantly eliminates the challenges of integrating large-scale
computer vision with on-device machine learning, allowing machine learning to be
used anywhere.TensorFlow Lite for Microcontrollers is a machine learning framework
designed to run on microcontrollers and other devices with limited memory.We can
improve the intelligence of billions of devices in our lives, including household appliances
and Internet of Things devices, by bringing machine learning to tiny microcontrollers,
rather than relying on expensive hardware or reliable internet connections, which are
often subject to bandwidth and power constraints and result in high latency. It also
helps in data security as the data stays in the device.TensorFlow Lite is specially
optimized for on-device machine learning (Edge ML). As an Edge ML model, it is
suitable for deployment to resource-constrained edge devices. [Goo19c; DMM20]

18.2. Sample TensorFlow Code


While TensorFlow is written with fast custom C++ code, it has a high level Python
interface that will used.
For example it is possible to create a custom Neural Network model using code that
looks 18.1. Each part of the code will be presneted later, but th following points are
important:

1. This is a custom class that extends the base class tf.keras.Model.

2. The class defines four specific kinds of tf.keras.layers.

3. The class has a call function which will run an input through those layers.

18.3. TensorFlow 2.0


TensorFLow is an open source library for deep leaning, often referred to as a framework
due to its scope. The free library has been available in Verson 2.0 since 2019. The
scope of the library has led to its widespread use in both academia and industry. The
current version 2.0 was released on 30 September 2019. [Goo19b]
The open-source library uses data flow graphs to create models. It enables developers
to create large neural networks with many layers. TensorFlow is mainly used for:
Classification, perception, understanding, discovery and prediction. [Goo19b]

18.4. TensorFlow and Python


According to [RA20], the Python API is the most complete and widespread interface
for TensorFlow and co. Therefore, it makes sense to familiarise yourself with Python
if you want to work with TensorFlow.

18.5. Installation of TensorFlow


In addition to the standard version of the library, there are also variants that support
special hardware. Depending on availability, AI accelerators and graphics cards are
supported. In addition to the standard variant, variants for Intel hardware and
for NVIDIA graphics cards are also presented here. First, the standard variant is
presented, which does not require any special hardware.
Chapter 18. TensorFlow 153

# Load i n t h e TensorFlow l i b r a r y
import t e n s o r f l o w a s t f

# D e f i n e my custom Neural Network model


c l a s s MyModel ( t f . k e r a s . Model ) :
def __init__ ( s e l f ) :
super ( MyModel , s e l f ) . __init__ ( )
# d e f i n e Neural Network l a y e r t y p e s
s e l f . conv = t f . k e r a s . l a y e r s . Conv2D ( 3 2 , 3 , a c t i v a t i o n= ’ r e l u ’ )
s e l f . f l a t t e n = t f . keras . layers . Flatten ()
s e l f . d e n s e 1 = t f . k e r a s . l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ )
s e l f . d e n s e 2 = t f . k e r a s . l a y e r s . Dense ( 1 0 )

# run my Neural Network model by e v a l u a t i n g


# each l a y e r on my i n p u t d a t a
def c a l l ( s e l f , x ) :
x = s e l f . conv ( x )
x = s e l f . flatten (x)
x = s e l f . dense1 ( x )
x = s e l f . dense2 ( x )
return x

# C r e a t e an i n s t a n c e o f t h e model
model = MyModel ( )

Listing 18.1.: Simple Example for TensorFlow


154

18.5.1. Standard Installation


Installation of the library is done as follows for an Anaconda installation:
First, the Anaconda prompt must be opened. Then it should first be started with
$ pip install -upgrade pip
the latest version of pip must be installed. This may result in an error message
regarding denied access, which recommends adapting the command accordingly:
$ pip install -upgrade pip -user
The pip version damaged by the previously failed upgrade can be repaired with
$ easy_install pip
Finally, the installation of TensorFlow is done with
$ pip install tensorflow

18.5.2. Installation of Intel Hardware


Intel has optimised the library for its hardware, such as the Intel Neural Compute
Stick 2. TensorFlow is installed specifically for Intel hardware, for example, using the
following commands:
$ conda install -c intel tensorflow

18.5.3. Installation of Jetson Nano Hardware


The NVIDIA page describes the installation of TensorFlow for NVIDIA, in this case
for the Jetson Nano.[NVI20a]
First, the necessary system packages required by TensorFlow are installed:
$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools \
libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev \
libblas-dev gfortran
Install and update Pip3:
$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip testresources setuptools==49.6.0
Install Python package dependencies:
$ sudo pip3 install -U numpy==1.16.1 future==0.18.2 mock==3.0.5 \
h5py==2.10.0 keras_preprocessing==1.1.1 keras_applications==1.0.8 \
gast==0.2.2 futures protobuf pybind11
The execution of the last command may take a few minutes. This is normal, even if
no forest progress of the activity is visible in the meantime.
Install a TensorFlow version that matches the JetPack version:
$ sudo pip3 install -extra-index-url \
https://developer.download.nvidia.com/compute/redist/jp/v$VERSION \
tensorflow
For the expression <$VERSION> the installed JetPack version must be used, where
44 stands for version 4.4 and 45 for 4.5 and so on. The installed version can be found
by entering
dpkg-query -show nvidia-l4t-core
and matching the L4T number with the respective JetPack versions on the url-
https://developer.nvidia.com/embedded/jetpack-archive page. Performing this step
also takes a few minutes.
Chapter 18. 155

from t e n s o r f l o w . k e r a s . d a t a s e t s import mnist


train_da , test_da = mnist . load_data ( )
x_train , y _ t r a i n = train_da
x_test , y _ t e s t = test_da

Listing 18.2.: Loading the data set MNIST

18.6. “Hello World 1” of Machine Learning -


Handwritten Digits
The typical first programme in Deep Learning is the recognition of handwritten digits
from the MNIST dataset. Also on the Web page of TensorFlow, under the Quick
Start for Beginners, there is a short tutorial on training a neural network based on
the MNIST dataset. The following presentation follows the tutorial from [RA20].

In this section, the complete process of training a neural network is shown. It is started
with a data set containing 70,000 images. This data set is already divided into test and
training images. Since it is the ten digits, the number of classes is also given as ten. A
neural network with one layer is used. All in all, this results in a short training time.
Since the images only have a resolution of 28 × 28-pixel and the neural network is
very small, only an accuracy of about 60 % is achieved. Extending the neural network
with a hidden layer containing 200 neurons improves the result significantly to over 80
%. A further improvement is achieved by using a CNN. This achieves an accuracy
of 99 %. During the training, the results are displayed graphically with the help of
Tensorboard.

Formalities must be observed when training a neural network with TensorFlow. The
following sequence must be taken into account:

1. Loading and testing the dataset MNIST as a dataset provided by TensorFlow.


2. normalisation of the images into floating point numbers
3. Mixing and shaping a training sequence (batch) with a defined number of
examples
4. Creation of the neural network and optimisation with Adam
5. Specification of an additional metric precision and
6. Training/testing the model by calling the function model.fit

18.6.1. Loading the Data Set MNIST


The command load_data() loads the data and immediately splits it into two parts.
The data then contain two lists each, the first list containing the images and the
second list the corresponding classification. The images are loaded in the list with the
designation x, the classification with y.

18.6.2. Data Preparation


A neural network expects data as 32-bit floating point numbers in the value range of
0 and 1. In the following code, the image data is therefore normalised accordingly and
brought into the format 28 × 28. The choice of normalisation depends on the selected
network structure.
156

import t e n s o r f l o w . k e r a s . backend a s K
from t e n s o r f l o w . k e r a s . u t i l s import t o _ c a t e g o r i c a l

dat_form = K. image_data_format ( )
rows , c o l s = 2 8 , 28
train_size = x _ t r a i n . shape [ 0 ]
test_size = x _ t e s t . shape [ 0 ]

i f dat_form == ’ c h a n n e l s _ f i r s t ’ :
x _ t r a i n = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , 1 , rows , c o l s )
x _ t e s t = x _ t e s t . r e s h a p e ( t e s t _ s i z e , 1 , rows , c o l s )
input_shape = ( 1 , rows , c o l s )
else :
x _ t r a i n = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , rows , c o l s , 1 )
x _ t e s t = x _ t e s t . r e s h a p e ( t e s t _ s i z e , rows , c o l s , 1 )
input_shape = ( rows , c o l s , 1 )

# norm d a t a t o f l o a t i n range 0 . . 1
x_train = x_train . astype ( ’ f l o a t 3 2 ’ )
x_test = x_test . astype ( ’ f l o a t 3 2 ’ )
x _ t r a i n /= 255
x _ t e s t /= 255

# conv c l a s s v e c s t o one h o t v e c
y _ t r a i n = t o _ c a t e g o r i c a l ( y_train , 1 0 )
y _ t e s t = t o _ c a t e g o r i c a l ( y_test , 1 0 )

Listing 18.3.: Normalisation of the data set MNIST


Chapter 18. 157

x_train = x_train [ : 1 0 0 ]
y_train = y_train [ : 1 0 0 ]

Listing 18.4.: Limitation of the fields to 100 elements

from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l


from t e n s o r f l o w . k e r a s . l a y e r s import Dense , F l a t t e n
model = S e q u e n t i a l ( )
model . add ( F l a t t e n ( ) )
model . add ( Dense ( 1 0 , a c t i v a t i o n= ’ softmax ’ ) )

Listing 18.5.: Building a model with two layers

To reduce the training time for the first trials, instead of starting with all training
data, the data is initially limited to 100 images:

18.6.3. Structure and Training of the Model


Before the model can be used, it must be built. To do this, the modules to be used
must first be imported. This model is created with the class Sequential. First, a
very simple model is used that consists of only two layers.

The command flatten() transforms the input data, here a field with 28 by 28 entries,
into a one-dimensional vector of size 784. The function model.add() adds each of
the calculations as a separate layer to the model. Then the function Dense() starts
the calculation of the activation function of the neurons. In this case there are ten
neurons and the activation function softmax. Each of the ten neurons corresponds to
one possible result, i.e. one of the ten letters. Each neuron, i.e. each possible outcome,
is assigned a probability by the function softmax, so that the sum of all neurons is 1.
A value close to 1 is to be interpreted in such a way that the associated result can
be assumed with a high degree of certainty. If all values are close to 0, no reliable
statement can be made.
TensorFlow uses the above construction to build a graph for the calculations, these are
automatically optimised and transformed into a format that allows efficient calculation
on the hardware. In practical use, the values are calculated from input to output. For
training, TensorFlow automatically determines the derivative of the loss function and
adds the gradient descent calculations to the graph. One function is sufficient for this:
The command passes the loss function categorical_crossentropy to the model.
The selected loss function calculates the squared Euclidean distance between two
vectors. To determine the optimal weights in the neural network, the algorithm
Adam() is used, which is a variant of gradient descent. The algorithm is very robust,

from t e n s o r f l o w . k e r a s . l o s s e s import c a t e g o r i c a l _ c r o s s e n t r o p y
from t e n s o r f l o w . k e r a s . o p t i m i z e r s import Adam
model . compile (
l o s s=c a t e g o r i c a l _ c r o s s e n t r o p y ,
o p t i m i z e r=Adam ( ) ,
m e t r i c s =[ ’ a c c u r a c y ’ ] )

Listing 18.6.: Preparation of the training


158

h i s t o r y = model . f i t ( x_train , y_train ,


b a t c h _ s i z e =128 ,
e p o c h s =12 , v e r b o s e =1,
v a l i d a t i o n _ d a t a =(x_test , y _ t e s t ) )

Listing 18.7.: Implementation of the training

so that for most applications no changes to the parameters are necessary. Through
the passing parameter metrics=[’accuracy’], it is counted during the calculations
how many images have been correctly determined by the neural network.

The training can be started with the command model.fit():


The function expects several parameters. First, the training data must be passed,
followed by the classifications. The third value is the batch size, it specifies how much
data is added to the calculation at the same time. On the one hand, the value should
be as large as possible to achieve a good result, on the other hand, the value is limited
by the hardware. This is because all input data of a batch are processed simultaneously
in the computer’s memory. The parameter epochs specifies how often the training
data of a batch are run through during training. More complex problems may require
more epochs and a lower learning rate.
After each training run, the command validation_data=(x_test, y_test) to
determine how well the model performs with unknown data.
The return value of the function model.fit() is an object history. With the help
of this object, the learning success can be tracked. The visualisation can be done with
TensorBoard, as shown in the next section.

18.6.4. History of the Training and Visualisation with TensorBoard


When the cell is executed with model.fit(), the output is the history. The history
for the example created above looks as follows:
Epoch 1/12
1/1 [==============================] − 1 s 705ms/ s t e p − l o s s : 2 . 3 4 2 8 − a c c u r a c
Epoch 2/12
1/1 [==============================] − 0 s 235ms/ s t e p − l o s s : 2 . 2 7 6 4 − a c c u r a c
Epoch 3/12
1/1 [==============================] − 0 s 235ms/ s t e p − l o s s : 2 . 2 1 2 5 − a c c u r a c
Epoch 4/12
1/1 [==============================] − 0 s 227ms/ s t e p − l o s s : 2 . 1 5 0 7 − a c c u r a c
Epoch 5/12
1/1 [==============================] − 0 s 217ms/ s t e p − l o s s : 2 . 0 9 0 9 − a c c u r a c
Epoch 6/12
1/1 [==============================] − 0 s 211ms/ s t e p − l o s s : 2 . 0 3 3 0 − a c c u r a c
Epoch 7/12
1/1 [==============================] − 0 s 218ms/ s t e p − l o s s : 1 . 9 7 6 6 − a c c u r a c
Epoch 8/12
1/1 [==============================] − 0 s 198ms/ s t e p − l o s s : 1 . 9 2 1 8 − a c c u r a c
Epoch 9/12
1/1 [==============================] − 0 s 216ms/ s t e p − l o s s : 1 . 8 6 8 5 − a c c u r a c
Epoch 10/12
1/1 [==============================] − 0 s 232ms/ s t e p − l o s s : 1 . 8 1 6 4 − a c c u r a c
Epoch 11/12
1/1 [==============================] − 0 s 238ms/ s t e p − l o s s : 1 . 7 6 5 6 − a c c u r a c
Epoch 12/12
Chapter 18. 159

h i s t o r y = model . f i t ( x_train , y_train ,


b a t c h _ s i z e =128 ,
e p o c h s =12 , v e r b o s e =1,
v a l i d a t i o n _ d a t a =(x_test , y _ t e s t ) ,
c a l l b a c k s =[ t e n s o r b o a r d _ c a l l b a c k ] )

Listing 18.8.: Carrying out the training with a callback function

1/1 [==============================] − 0 s 240ms/ s t e p − l o s s : 1 . 7 1 6 0 − a c c u r a c


With the tool TensorBoard the history can also be visualised. An introduction to the
use of TensorBoard can be found at [Goo10].
To load TensorBoard, the following should be included in the beginning of the script
%load_ext tensorboard

be executed. If only the history of the respective run is to be visualised, any data
already contained in the log file must be deleted. already contained in the log file
must be deleted. According to [Goo10], this should be done with the command
!rm -rf ./logs/

be possible. However, this does not work in every case. As an alternative


import shutil
shutil.rmtree(’./logs’,ignore_errors=True)

can be used. Before executing model.fit(), the log file and the callback function
must be prepared:
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, \
histogram_freq=1)

With this callback function, the data, named after the time of the training, is stored.
In the training, the callback function must be given to the tensor board, which is why
this part must be adapted:
With the input
%tensorboard -logdir logs/fit

the training is visualised with TensorBoard after it is completed. For the simple model
built above, the history of the training in TensorBoard looks as shown in Figure 18.1.

Figure 18.1.: Training history of the simple model visualised with TensorBoard
160

from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l


from t e n s o r f l o w . k e r a s . l a y e r s import Dense , F l a t t e n
from t e n s o r f l o w . k e r a s . l a y e r s import Dropout
model = S e q u e n t i a l ( )
model . add ( F l a t t e n ( ) )
model . add ( Dense ( 2 0 0 , a c t i v a t i o n= ’ r e l u ’ ) )
model . add ( Dropout ( 0 . 5 ) )
model . add ( Dense ( 1 0 , a c t i v a t i o n= ’ softmax ’ ) )

Listing 18.9.: Neural network with a second hidden layer and the dropout function

18.6.5. Problems with TensorBoard


When using TensorBoard, problems can occur when establishing the connection,
loading the data or loading the „correct“ data. If, for example, data is still displayed
that has been deleted in the meantime, or if for some other reason a complete shutdown
and restart of the TensorBoard is to be forced because Tensorboard no longer responds,
the following commands - to be entered in the Windows command prompt - may be
helpful:

taskkill /im tensorboard.exe /f


del /q %TMP%\.tensorboard-info\*

An error message may appear that the file cannot be found. This can be ignored.
After a short time, the TensorBoard should then be able to be reloaded.

18.6.6. Neural network


To make the neural network „more intelligent“, an additional hidden layer is added, a
so-called hidden layer. This layer consists of 200 neurons with the ReLu activation
function. Another improvement can be achieved by dropout, which avoids overfitting,
i.e. memorising the training data. This could also be achieved by substantially
increasing the number of training data, but of course this also entails a much higher
computation time. Instead, a layer is inserted between the two layers with neurons,
which randomly discards half of all information. The new definition of the model looks
as follows:
As can be seen in Figure 18.2, training with the neural network and dropout leads to
better results.

Figure 18.2.: Training history of the simple model visualised with TensorBoard
Chapter 18. 161

from t e n s o r f l o w . k e r a s . l a y e r s import Conv2D


from t e n s o r f l o w . k e r a s . l a y e r s import MaxPooling2D
model . add ( Conv2D ( 3 2 ,
k e r n e l _ s i z e =(3 , 3 ) ,
a c t i v a t i o n= ’ r e l u ’ ,
input_shape=input_shape ) )
model . add ( Conv2D ( 6 4 ,
k e r n e l _ s i z e =(3 , 3 ) ,
a c t i v a t i o n= ’ r e l u ’ ) )
model . add ( MaxPooling2D ( p o o l _ s i z e =(2 , 2 ) ) )
model . add ( Dropout ( 0 . 2 5 ) )
model . add ( F l a t t e n ( ) )
model . add ( Dense ( 2 0 0 ,
activation = ’ relu ’ ))
model . add ( Dropout ( 0 . 5 ) )
model . add ( Dense ( 1 0 ,
a c t i v a t i o n = ’ softmax ’ ) )

Listing 18.10.: Building a Convolutional Neural Network

18.6.7. Convolutional Neural Network


Convolutional neural networks are particularly suitable for image classification tasks.
Therefore, the idea is naturally to use them here as well. For this purpose, the model
is redefined again:

There are now two convolutional layers and one pooling layer in this model. The
newly added Conv2D() layers belong in front of the existing Slabs() layer because
they use the two-dimensional structure of the input images. The first layer learns 32
filters with a matrix size of 3 × 3, the second 64 filters. This is followed by a maximum
pooling layer that stores only the largest value from each 2 × 2 field.
With the training data set limited to 100 samples, even this training is still fast. The
history is shown in Figure 18.3.

Figure 18.3.: Training history of the simple model visualised with TensorBoard

If you train over 12 epochs with all 60,000 training data, the training on a laptop
with an AMD Ryzen 5 3500U processor takes over 15 minutes (about 1.5 minutes
per epoch at about 200ms per step). In return, an accuracy of over 99 % is already
achieved in the sixth run, which Tensornboard displays as the fifth (see figure 18.4).
162

Figure 18.4.: Trainingshistorie des einfachen Modells visualisiert mit TensorBoard

18.6.8. Complete code


The complete code with the previously described steps and the final CNN model is
recorded in Listing 18.11. It was created with Jupyter Notebook.

18.7. “Hello World 2” - Detection of an Iris


The recognition of the iris is another classic for image classification. There are also
WS:Sources several tutorials on this classic. The following tutorial is based on the tutorial by
[Ani20].
The aim of this tutorial is to classify three different iris species based on four different
attributes, the length and width of the sepal and the petal. That is, the starting
point is a CSV format file. However, only 50 measurements are available for each iris
species. The data are first normalised by using a „OneHotVector“ so that they can be
processed with a neural network. The neural network used consists of nine hidden
layers and one output layer. Five hidden layers have 64 neurons and four have 128.
The activation function ReLu is used for all hidden layers. In the output layer, the
activation function is SoftMax so that each class is assigned a probability. An accuracy
of approximately 87 % is achieved. Finally, to avoid overfitting, regularisation is used.

18.7.1. Loading Fisher’s Iris Data Set


The Fisher’s Iris data set is delivered with the library sklearn.
Other libraries must be loaded for image recognition:
The libraries numpy, pandas and pyplot are used for the preparation and visualisation
of the data and the presentation of the results. In a neural network, the calculations
are processed sequentially. The model type Sequential is imported to match this.
The model is built up from individual layers that are processed sequentially. The
layer type Dense is then imported. The layer type is a type of layer that is „dicht“
connected, which means that all nodes of the previous layer are connected to all nodes
of the current layer.
The record is a dictionary. Its keys can be easily displayed:
iris.keys()

You get as output:


dict_keys([’data’, ’target’, ’frame’, ’target_names’, ’DESCR’, ’fea-
ture_names’, ’filename’])

The individual elements can now be viewed. After entering the command
iris[’DESCR’]
Chapter 18. 163

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# T r a i n i n g o f a CNN w i t h t h e MNIST d a t a s e t and v i s u a l i z a t i o n w i t h t e n s o r b o a r d


# b a s e d on t h e T u t o r i a l i n c ’ t Python−P r o j e k t e and h t t p s : / /www. t e n s o r f l o w . o r g / t e n

# In [ ] :

#p r e p a r a t i o n f o r t e n s o r b o a r d
get_ipython ( ) . run_line_magic ( ’ load_ext ’ , ’ t e n s o r b o a r d ’ )
import t e n s o r f l o w a s t f
import d a t e t i m e

# In [ ] :

#c l e a r p r e v i o u s l o g s
import s h u t i l
s h u t i l . r m t r e e ( ’ . / l o g s ’ , i g n o r e _ e r r o r s=True )

# In [ ] :

#l o a d mnist d a t a s e t
from t e n s o r f l o w . k e r a s . d a t a s e t s import mnist
train_da , test_da = mnist . load_data ( )
x_train , y _ t r a i n = train_da
x_test , y _ t e s t = test_da

# In [ ] :

#d a t a p r e p a r a t i o n / t r a n s f o r m a t i o n
import t e n s o r f l o w . k e r a s . backend a s K
from t e n s o r f l o w . k e r a s . u t i l s import t o _ c a t e g o r i c a l
dat_form = K. image_data_format ( )
rows , c o l s = 2 8 , 28
t r a i n _ s i z e = x _ t r a i n . shape [ 0 ]
t e s t _ s i z e = x _ t e s t . shape [ 0 ]
i f dat_form == ’ c h a n n e l s _ f i r s t ’ :
x_train = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , 1 , rows , c o l s )
x_test = x _ t e s t . r e s h a p e ( t e s t _ s i z e , 1 , rows , c o l s )
input_shape = ( 1 , rows , c o l s )
else :
x_train = x _ t r a i n . r e s h a p e ( t r a i n _ s i z e , rows , c o l s , 1 )
x_test = x _ t e s t . r e s h a p e ( t e s t _ s i z e , rows , c o l s , 1 )
input_shape = ( rows , c o l s , 1 )
# norm d a t a t o f l o a t i n range 0 . . 1
x_train = x_train . astype ( ’ f l o a t 3 2 ’ )
x_test = x_test . astype ( ’ f l o a t 3 2 ’ )
x _ t r a i n /= 255
x _ t e s t /= 255
# conv c l a s s v e c s t o one h o t v e c
y _ t r a i n = t o _ c a t e g o r i c a l ( y_train , 1 0 )
y_test = t o _ c a t e g o r i c a l ( y_test , 1 0 )

# In [ ] :
164

from s k l e a r n . d a t a s e t s import l o a d _ i r i s
i r i s = load_iris ()

Listing 18.12.: Loading the data set Iris

import numpy a s np
import pandas a s pd
import m a t p l o t l i b . p y p l o t a s p l t

import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l
from t e n s o r f l o w . k e r a s . l a y e r s import Dense

Listing 18.13.: Libraries for Hello World 2 - Iris

a detailed description is output:


With the input
iris[’feature_names’]
it becomes clear that the characteristics by which flowers are classified are the length
and width of the sepal and petal:
[ ’ s e p a l ␣ l e n g t h ␣ (cm) ’ ,
’ s e p a l ␣ width ␣ (cm) ’ ,
’ p e t a l ␣ l e n g t h ␣ (cm) ’ ,
’ p e t a l ␣ width ␣ (cm) ’ ]
The names of the flowers that appear after entering
iris[’target_names’]
are displayed, are
a r r a y ( [ ’ s e t o s a ’ , ’ v e r s i c o l o r ’ , ’ v i r g i n i c a ’ ] , dtype= ’<U10 ’ )
For further investigation, the data with the headings are included in a data frame.
x = pd.DataFrame(data = iris.data, columns = iris.feature_names)
print(x.head())
The data structure x now only contains the first four columns. The class will later be
stored in the data structure y. This separation had to be done in preparation for the
training. The command (x.head()) shows – as in Figure 18.5 – the head of the data
frame. It is obvious that each data set consists of four values.

Figure 18.5.: Header lines of the data set Iris

Each record also already contains its classification in the key target. In the figure 18.6
this is listed for the first data sets.
y = pd.DataFrame(data=iris.target, columns = [’irisType’])
Chapter 18. 165

’ . . ␣ _ i r i s _ d a t a s e t : \ n\ n I r i s ␣ p l a n t s ␣ d a t a s e t
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−−−−−−−−−−−−−−−−−−−−
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ∗∗ Data ␣ S e t ␣ C h a r a c t e r i s t i c s : ∗ ∗
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Number␣ o f ␣ I n s t a n c e s : ␣ 150 ␣ ( 5 0 ␣ i n ␣ each ␣ o f ␣ t h r e e ␣ c l a s s e s )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Number␣ o f ␣ A t t r i b u t e s : ␣ 4 ␣ numeric , ␣ p r e d i c t i v e ␣ a t t r i b u t e s ␣and␣ t h e ␣ c l a s s
␣␣␣␣␣␣␣␣ : Attribute ␣ Information :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ s e p a l ␣ l e n g t h ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ s e p a l ␣ width ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ p e t a l ␣ l e n g t h ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ p e t a l ␣ width ␣ i n ␣cm
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ c l a s s :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −S e t o s a
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −V e r s i c o l o u r
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ I r i s −V i r g i n i c a

␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Summary␣ S t a t i s t i c s :
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=========================
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣Min␣ ␣Max␣ ␣ ␣Mean␣ ␣ ␣ ␣SD␣ ␣ ␣ C l a s s ␣ C o r r e l a t i o n
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=====␣====================
␣␣␣␣␣␣␣␣ s e p a l ␣ length : ␣␣␣ 4 . 3 ␣␣ 7 . 9 ␣␣␣ 5.84 ␣␣␣ 0.83 ␣␣␣␣ 0.7826
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ s e p a l ␣ width : ␣ ␣ ␣ ␣ 2 . 0 ␣ ␣ 4 . 4 ␣ ␣ ␣ 3 . 0 5 ␣ ␣ ␣ 0 . 4 3 ␣ ␣ ␣ −0.4194
␣␣␣␣␣␣␣␣ p e t a l ␣ length : ␣␣␣ 1 . 0 ␣␣ 6 . 9 ␣␣␣ 3.76 ␣␣␣ 1.76 ␣␣␣␣ 0.9490 ␣␣ ( high ! )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ p e t a l ␣ width : ␣ ␣ ␣ ␣ 0 . 1 ␣ ␣ 2 . 5 ␣ ␣ ␣ 1 . 2 0 ␣ ␣ ␣ 0 . 7 6 ␣ ␣ ␣ ␣ 0 . 9 5 6 5 ␣ ␣ ( h i g h ! )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣==============␣====␣====␣=======␣=====␣====================

␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : M i s s i n g ␣ A t t r i b u t e ␣ Values : ␣None
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : C l a s s ␣ D i s t r i b u t i o n : ␣ 33.3% ␣ f o r ␣ each ␣ o f ␣ 3 ␣ c l a s s e s .
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : C r e a t o r : ␣R.A. ␣ F i s h e r
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Donor : ␣ Michael ␣ M a r s h a l l ␣ (MARSHALL%PLU@io . a r c . nasa . gov )
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ : Date : ␣ July , ␣ 1988

␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣The␣ famous ␣ I r i s ␣ database , ␣ f i r s t ␣ used ␣by␣ S i r ␣R.A. ␣ F i s h e r . ␣The␣ d a t a s e t ␣ i s ␣ t


␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ from ␣ F i s h e r \ ’ s ␣ paper . ␣ Note ␣ t h a t ␣ i t \ ’ s ␣ t h e ␣same␣ a s ␣ i n ␣R, ␣ but ␣ not ␣ a s ␣ i n ␣ t h e
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ Machine ␣ L e a r n i n g ␣ R e p o s i t o r y , ␣ which ␣ has ␣two␣ wrong ␣ data ␣ p o i n t s .

␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ This ␣ i s ␣ p e r h a p s ␣ t h e ␣ b e s t ␣known␣ d a t a b a s e ␣ t o ␣ be ␣ found ␣ i n ␣ t h e


␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ p a t t e r n ␣ r e c o g n i t i o n ␣ l i t e r a t u r e . ␣ ␣ F i s h e r \ ’ s ␣ paper ␣ i s ␣ a ␣ c l a s s i c ␣ i n ␣ t h e ␣ f i e l
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ i s ␣ r e f e r e n c e d ␣ f r e q u e n t l y ␣ t o ␣ t h i s ␣ day . ␣ ␣ ( See ␣Duda␣&␣ Hart , ␣ f o r ␣ example . ) ␣ ␣T
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ data ␣ s e t ␣ c o n t a i n s ␣ 3 ␣ c l a s s e s ␣ o f ␣ 50 ␣ i n s t a n c e s ␣ each , ␣ where ␣ each ␣ c l a s s ␣ r e f e r s
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ type ␣ o f ␣ i r i s ␣ p l a n t . ␣ ␣One␣ c l a s s ␣ i s ␣ l i n e a r l y ␣ s e p a r a b l e ␣ from ␣ t h e ␣ o t h e r ␣ 2 ; ␣ t h
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ l a t t e r ␣ a r e ␣NOT␣ l i n e a r l y ␣ s e p a r a b l e ␣ from ␣ each ␣ o t h e r .
␣␣␣␣␣␣␣␣ . . ␣ t o p i c : : ␣ References
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ F i s h e r , ␣R.A. ␣ " The␣ u s e ␣ o f ␣ m u l t i p l e ␣ measurements ␣ i n ␣ taxonomic ␣ problems "
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ Annual ␣ Eugenics , ␣ 7 , ␣ Part ␣ I I , ␣179−188␣ ( 1 9 3 6 ) ; ␣ a l s o ␣ i n ␣ " C o n t r i b u t i o n s ␣ t o
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ Mathematical ␣ S t a t i s t i c s " ␣ ( John ␣ Wiley , ␣NY, ␣ 1 9 5 0 ) .
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣Duda , ␣R.O. , ␣&␣ Hart , ␣P . E . ␣ ( 1 9 7 3 ) ␣ P a t t e r n ␣ C l a s s i f i c a t i o n ␣and␣ Scene ␣ A n a l y s
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ( Q327 . D83 ) ␣ John ␣ Wiley ␣&␣ Sons . ␣ ␣ISBN␣0−471−22361−1.␣ ␣ See ␣ page ␣ 2 1 8 .
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ Dasarathy , ␣B .V. ␣ ( 1 9 8 0 ) ␣ " Nosing ␣Around␣ t h e ␣ Neighborhood : ␣A␣New␣ System
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ S t r u c t u r e ␣and␣ C l a s s i f i c a t i o n ␣ Rule ␣ f o r ␣ R e c o g n i t i o n ␣ i n ␣ P a r t i a l l y ␣ Exposed
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ Environments " . ␣ ␣IEEE␣ T r a n s a c t i o n s ␣on␣ P a t t e r n ␣ A n a l y s i s ␣and␣ Machine
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ I n t e l l i g e n c e , ␣ Vol . ␣PAMI−2,␣No . ␣ 1 , ␣ 67 −71.
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ Gates , ␣G.W. ␣ ( 1 9 7 2 ) ␣ " The␣ Reduced ␣ N e a r e s t ␣ Neighbor ␣ Rule " . ␣ ␣IEEE␣ T r a n s a c t i
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣on␣ I n f o r m a t i o n ␣ Theory , ␣May␣ 1 9 7 2 , ␣ 431 −433.
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣ See ␣ a l s o : ␣ 1988 ␣MLC␣ P r o c e e d i n g s , ␣ 54 −64. ␣ ␣ Cheeseman ␣ e t ␣ a l " s ␣AUTOCLASS␣ I I
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ c o n c e p t u a l ␣ c l u s t e r i n g ␣ system ␣ f i n d s ␣ 3 ␣ c l a s s e s ␣ i n ␣ t h e ␣ data .
␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣−␣Many , ␣many␣more␣ . . . ’

Listing 18.14.: Information on the data set Iris from scikit-learn


166

y.head()

Figure 18.6.: Output of the categories of the data set Iris

By means of the command


y.irisType.value_counts()
can be used to determine how many classes are present. The output of the command
is shown in the figure 18.7; it results in 3 classes with the numbers 0, 1 and 2. There
are 50 records assigned to each of them.

Figure 18.7.: Names of the categories in the data set Iris

18.7.2. Data preparation


Now the process can be started. First, the data must be prepared. In this case, the
following steps are necessary:

1. Missing values must be completed.


2. The data must be split into training and validation data.
3. The data is normalised.
4. The data representing categories must be converted into a vector.
5. The data is entered into vectors.

Missing Values
With the command
pandas.DataFrame.info()
can be used to check whether the data is consistent. The figure 18.8 shows that the
data is complete and the data type is identical for all values. Consequently, no action
is required for this data set.
Chapter 18. 167

x.info()

Figure 18.8.: Checking the data set Iris

Split into Training and Validation Data


To split a dataset into training and validation data, the method train_test_split
is available in the library sklearn in the class model_selection. Here 10% of the
data set is retained as validation data.
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.1)

Normalisation of the Data


If the data have a high variance, they are to be normalised. Therefore, the variance
is determined first. The function var() is available for this from the class pan-
das.DataFrame. According to Figure 18.9, the output shows that the variance is low
and therefore no action is required.
x_train.var(), x_test.var()

Figure 18.9.: Variance in the data set Iris

Converting the Categories into a Vector


From examining the data set, it is known that three categories are included. Currently
they are numbered 0, 1 and 2. The algorithm used could assign higher values a higher
168

priority, so that the result is distorted. To avoid this, a vector is generated that
contains only the 0 and 1 values, thus encoding the category. This is a so-called
„OneHotVector“. There are two possibilities for this. One could use the function
to_categorical of the library Keras or the function OneHotEncoder of the library
sklearn.

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

The result of the first five data shows the figure 18.10 on the right. It is displayed with
y_train[:5,:]

The left side of the figure 18.10 shows the original coding of the categories. Here you
can also see which data sets were randomly selected as the first five training data.

Figure 18.10.: Coding of the categories before (left) and after conversion

Conversion of the Data into Vectors


To make the data easier to process, it is converted into vectors of the library numpy.
x_train = x_train.values
x_test = x_test.values

The result of the conversion can be checked in the first data set:
x_train[0]

The result is as follows:


array ( [ 5 . 5 , 2.4 , 3.7 , 1. ] )

18.7.3. Structure and Training of the Model


After the preparations, the sequential model can now be created and the layers added.
In this example, the model consists of one input layer, eight hidden layers and one
output layer. The activation function of the first layers is relu, only for the output
layer the activation function softmax is used, because it assigns probabilities to the
possible categories. In the case where there are only two possible categories, the
activation function sigmoid would be chosen. Besides the activation function relu,
the functions sigmoid, linear or tanh are also available. The experiments have
shown that the activation function relu shows the best success.
The input layer contains an additional argument input_shape. This argument ensures
that the dimension of the first layer is based on the dimension of a data set. In this
example, the dimension is four.
Chapter 18. 169

model1 = S e q u e n t i a l ( )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ , input_shape= X_train [ 0 ] . shape ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 1 2 8 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 6 4 , a c t i v a t i o n= ’ r e l u ’ ) )
model1 . add ( Dense ( 3 , a c t i v a t i o n= ’ softmax ’ ) )

Listing 18.15.: Structure of the neural network for the data set Iris

The next step is to finalise the model. For this, the optimisation, the evaluation
function and the metric must be passed.
model1.compile(optimizer=’adam’, loss= ’categorical_crossentropy’, met-
rics = [’acc’])
Several candidates are available for the optimisation function:

• Stochastic Gradient Descent,


• RMSProp or also
• adam.

Here, as in the previous example, adam was used.


For the evaluation function, categorical_crossentropy is used here, which is mostly
used in classification tasks with several classes where only one category is correct at a
time. If only two categories are available, binary_crossentropy is used instead.
Metrics are important to evaluate one’s model. There are several metrics we can use
to evaluate our model. For classification problems, the most important metric is the
accuracy [’acc’], which indicates how accurate our predictions are.
In the last step, the model is now trained using the training data.
history = model1.fit(x_train, y_train, batch_size = 40, epochs=800, val-
idation_split = 0.1)
For the creation of the model, 800 epochs with a batch size of 40 were set here. As
validation data, 10% of the training data set is taken at random.

18.7.4. History of the Training and Visualisation with TensorBoard


The fit method returns a data structure that contains the entire history of our
training. This allows us to track how the quality has evolved during the training. For
example, a hint of overfitting can be detected by much better performance in the
training data than in the validation data. Furthermore, one can see if the quality
behaves asymptotically in the last epochs or if the training should be extended or can
be shortened.
The return value is a dictionary that can be accessed via history.history. Here one
can, among other things, follow the development of the accuracy in the training as well
as in the validation data set. can be followed. We can access the history of the elements
acc, loss, val_loss and val_acc with history.history.loss or history.history[’val_acc’]
etc. respectively.
170

In the figure 18.11 the progression of accuracy over the epochs is shown. Only five
lines need to be programmed to generate the graph:

plt.plot(history.history[’acc’])

plt.plot(history.history[’val_acc’])

plt.xlabel(’Epochs’)

plt.ylabel(’Acc’)

plt.legend([’Training’, ’Validation’], loc=’upper right’)

Figure 18.11.: Course of the accuracy for the training and test data during the training
of the model

The course of the loss function can be drawn analogously:

plt.plot(history.history[’loss’])

plt.plot(history.history[’val_loss’])

plt.xlabel(’Epochs’)

plt.ylabel(’Loss’)

plt.legend([’Training’, ’Validation’], loc=’upper left’)


Chapter 18. 171

Figure 18.12.: Course of the loss function for the training and test data during the
training of the model

According to [Ani20], we expect to observe an overfitting, which manifests itself in


the fact that the results for the training data are significantly better than for the
validation data. However, this problem is not evident in the results obtained. The
validation data even perform better than the training data. Compare this also with
the plots of [Ani20]. Possibly different versions of TensorFlow were used here, which
lead to different results.
To check the model, the function model.evaluate can be used; the method expects
the test data with the associated categories:

model1.evaluate(x_test, y_test)

The achieved accuracy on the test data is about 87 %.

18.7.5. Regularisation

To achieve a better model, regularisation is introduced. This reduces the risk of


overfitting, which was expected for the first model. In this model, an L2 regularisation
is added. This adds the squared amount of the respective coefficient as a penalty term
in the loss function for determining the weights.
As a second measure to reduce overfitting and improve the model, dropout layers are
added.
Apart from the introduction of regularisation and the two dropout layers, nothing is
changed in the model or in the parameters.

An accuracy of 87 % is also achieved for this model using the test data.
The course of the accuracy and the evaluation function are shown in the figures 18.13
and 18.14.
172

model2 = S e q u e n t i a l ( )
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , input_shape= X_train [ 0 ] . shape ) )
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( t f . k e r a s . l a y e r s . Dropout ( 0 . 5 ) )
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r
model2 . add ( Dense ( 1 2 8 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r = t f . k e r a s .
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( t f . k e r a s . l a y e r s . Dropout ( 0 . 5 ) )
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 6 4 , a c t i v a t i o n = ’ r e l u ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s . r e
model2 . add ( Dense ( 3 , a c t i v a t i o n = ’ softmax ’ , k e r n e l _ r e g u l a r i z e r=t f . k e r a s .

Listing 18.16.: Construction of the neural network for the data set Iris - Improved
version

model2 . compile ( o p t i m i z e r= ’ adam ’ ,


l o s s= ’ c a t e g o r i c a l _ c r o s s e n t r o p y ’ ,
m e t r i c s =[ ’ a c c ’ ] )
h i s t o r y 2 = model2 . f i t ( x_train , y_train ,
e p o c h s =800 ,
v a l i d a t i o n _ s p l i t =0.1 , b a t c h _ s i z e =40)
model2 . e v a l u a t e ( x\ _test , y\ _ t e s t )

Listing 18.17.: Training of the second model

plt . p l o t ( h i s t o r y 2 . h i s t o r y [ ’ acc ’ ] )
plt . p l o t ( h i s t o r y 2 . h i s t o r y [ ’ val_acc ’ ] )
plt . t i t l e ( ’ Accuracy ␣ vs . ␣ e p o c h s ’ )
plt . y l a b e l ( ’ Acc ’ )
plt . x l a b e l ( ’ Epoch ’ )
plt . l e g e n d ( [ ’ T r a i n i n g ’ , ’ V a l i d a t i o n ’ ] , l o c= ’ l o w e r ␣ r i g h t ’ )
plt . show ( )

Listing 18.18.: Course of the accuracy and the evaluation function


Chapter 18. 173

plt . plot ( history2 . history [ ’ loss ’ ] )


plt . plot ( history2 . history [ ’ val_loss ’ ] )
plt . t i t l e ( ’ Loss ␣ vs . ␣ e p o c h s ’ )
plt . y l a b e l ( ’ Loss ’ )
plt . x l a b e l ( ’ Epoch ’ )
plt . l e g e n d ( [ ’ T r a i n i n g ’ , ’ V a l i d a t i o n ’ ] , l o c= ’ upper ␣ r i g h t ’ )
plt . show ( )

Listing 18.19.: Course of the loss function

Figure 18.13.: Course of accuracy for the training and test data during the training of
the second model

The Python program part for the loss function is given as follows:
174

Figure 18.14.: Course of the loss function for the training and test data during the
training of the second model

18.7.6. Complete Code


The entire script of the tutorial described can be found in Listing 18.20.

18.8. „Hello World 3“- Bottle Detection


In this section, a neural network is trained to recognise bottles. In the previous
examples, an existing data set was used. Based on the known data sets CIFAR10
and CIFAR100, a new data set is generated here. In this case, the images have three
colour channels and 32×32-pixels. The images are clearly assigned to classes.

18.8.1. Dataset CIFAR-10 and CIFAR-100


In this example, we will use the CIFAR-10 and CIFAR-100 datasets, developed by
Alex Krizhevsky and his team and therefore named after the Canadian Institute
f or Advanced Research. Both datasets contain × 32-pixel colour images with three
colour channels assigned to corresponding categories between which there is no overlap.
[KH09]
The data set CIFAR-10 contains a total of 60,000 images in 10 categories (airplane,
automobile, bird, cat, deer, dog, frog, horse, ship, truck) with 6,000 images each. The
data set is already divided into 50,000 training images and 10,000 test images.
The data set CIFAR-100, on the other hand, also contains 60,000 images, but divided
into 100 categories with 600 images each. In addition, there are 20 upper categories
to which some of the 100 categories are assigned (see 18.15).

18.8.2. Loading the Dataset


The datasets can be downloaded or imported directly into TensorFlow via Keras
[Goo20d; Raj20; Kaga; KSH17]:
import tensorflow as tf
from tensorflow.keras import datasets
Chapter 18. 175

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# In [ ] :

#T u t o r i a l f o r I r i s D a t a s e t
#code b a s e d on t h e t u t o r i a l @ h t t p s : / /www. k d n u g g e t s . com/2020/07/ g e t t i n g −s t a r t e d −t

# In [ ] :

#l o a d d a t a
from s k l e a r n . d a t a s e t s import l o a d _ i r i s
i r i s = load_iris ()

# In [ ] :

#l o a d o t h e r needed l i b r a r i e s
from s k l e a r n . m o d e l _ s e l e c t i o n import t r a i n _ t e s t _ s p l i t #t o s p l i t d a t a

import numpy a s np
import pandas a s pd
import m a t p l o t l i b . p y p l o t a s p l t

import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s . l a y e r s import Dense
from t e n s o r f l o w . k e r a s . models import S e q u e n t i a l

# In [ ] :

#c o n v e r t i n t o d a t a frame
X = pd . DataFrame ( data = i r i s . data , columns = i r i s . feature_names )
y = pd . DataFrame ( data= i r i s . t a r g e t , columns = [ ’ i r i s T y p e ’ ] )

# In [ ] :

#c h e c k i f d a t a i s c o m p l e t e
X. i n f o ( )

# In [ ] :

#S p l i t d a t a i n t o t r a i n i n g and t e s t s e t
X_train , X_test , y_train , y _ t e s t = t r a i n _ t e s t _ s p l i t (X, y , t e s t _ s i z e =0.1)

# In [ ] :

#c h e c k v a r i a n c e
X_train . var ( ) , X_test . var ( )

# In [ ] :
176
Superclass Classes
aquatic mammals beaver, dolphin, otter, seal, whale
fish aquarium fish, flatfish, ray, shark, trout
flowers orchids, poppies, roses, sunflowers, tulips
food containers bottles, bowls, cans, cups, plates
fruit and vegetables apples, mushrooms, oranges, pears, sweet peppers
household electrical devices clock, computer keyboard, lamp, telephone, television
household furniture bed, chair, couch, table, wardrobe
insects bee, beetle, butterfly, caterpillar, cockroach
large carnivores bear, leopard, lion, tiger, wolf
large man-made outdoor things bridge, castle, house, road, skyscraper
large natural outdoor scenes cloud, forest, mountain, plain, sea
large omnivores and herbivores camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals fox, porcupine, possum, raccoon, skunk
non-insect invertebrates crab, lobster, snail, spider, worm
people baby, boy, girl, man, woman
reptiles crocodile, dinosaur, lizard, snake, turtle
small mammals hamster, mouse, rabbit, shrew, squirrel
trees maple, oak, palm, pine, willow
vehicles 1 bicycle, bus, motorcycle, pickup truck, train
vehicles 2 lawn-mower, rocket, streetcar, tank, tractor

Figure 18.15.: Categories in CIFAR-100

(X_train10, y_train10), (X_test10, y_test10) = \


tf.keras.datasets.cifar10.load_data()
(X_train100, y_train100), (X_test100, y_test100) = \
tf.keras.datasets.cifar100.load_data()

18.8.3. Data Preparation


The data set CIFAR-10 is to be merged with the data of the category „bottles“ from
the data set CIFAR-100 to form a new data set. For visualisation purposes pyplot and
for purposes of manipulation of the datasets numpy will be imported. In addition, the
required modules of Keras are imported and the datasets CIFAR-10 and CIFAR-100
are loaded.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
(X_train10, y_train10), (X_test10, y_test10) = \
tf.keras.datasets.cifar10.load_data()
(X_train100, y_train100), (X_test100, y_test100) = \
tf.keras.datasets.cifar100.load_data()
The shape of the training data can be checked by displaying the property .shape for
the training data of both data sets:
print(’Images Shape: ’.format(X_train10.shape))
print(’Labels Shape: ’.format(y_train10.shape))
print(’Images Shape: ’.format(X_train100.shape))
print(’Labels Shape: ’.format(y_train100.shape))
The result is:
Images Shape: (50000, 32, 32, 3)
Chapter 18. 177

Labels Shape: (50000, 1)

Images Shape: (50000, 32, 32, 3)

Labels Shape: (50000, 1)

That is, they receive 50,000 training data per data set, consisting of 32×32- pixels
in three colour channels. A vector of length 50,000 is assigned, which contains the
corresponding labels. To see in which form the labels are coded, one can look at the
first ten entries:

print(y_train10[:10])

[6]

[9]

[9]

[4]

[1]

[1]

[2]

[7]

[8]

[3]]

If you look at several entries, you will notice that the ten categories are coded with the
numbers 0 to 9. This means that the vector with the categories is to be transformed
into the OneHotVector representation.
In Figure 18.16 the first 25 training data are shown with associated labels. This can
be easily programmed:

plt.figure(figsize=(10,10))

for i in range(25):

plt.subplot(5,5,i+1)

plt.xticks([])

plt.yticks([])

plt.grid(False)

plt.imshow(X_train10[i], cmap=plt.cm.binary)

plt.xlabel(y_train10[i])

plt.show()
178

Figure 18.16.: Visualisation of the first 25 training data from the data set CIFAR-10

Similarly, examples from the CIFAR-100 dataset can be displayed in Figure 18.17:

plt.figure(figsize=(10,10))

for i in range(25):

plt.subplot(5,5,i+1)

plt.xticks([])

plt.yticks([])

plt.grid(False)

plt.imshow(X_train100[i], cmap=plt.cm.binary)

plt.xlabel(y_train100[i])

plt.show()
Chapter 18. 179

Figure 18.17.: Visualisation of the first 25 training data from the data set CIFAR-100

It can be seen that the categories are numbered in alphabetical order (0-apples,
1-aquarium fish...; cf. table 18.15). Accordingly, the category „bottles“ has index 9,
which can be checked by displaying the corresponding images.
Since the goal is to add only the images of the category “bottles” to CIFAR-10, the
CIFAR-100 dataset has to be filtered and reduced so that only the images with index
9 remain. This is done by searching for all entries with the value 9 in the label vector
and applying the resulting structure to the training data to reduce it to these entries.
Since we want to add the filtered data to a dataset that has already assigned the index
9 to an existing category, it will also have its index changed to 10 in this step.
idx = (y_train100 == 9).reshape(X_train100.shape[0])
X_train100 = X_train100[idx]
y_train100 = y_train100[idx]
for i in range(len(y_train100)):
y_train100[i]=10
With the query
len(X_train100)
we find that the number of training data in the CIFAR-100 dataset has reduced from
50,000 to 500, which is exactly the number of expected entries per category. For a
rough check, we can again display the first 25 entries from the modified CIFAR-100
dataset. As can be seen in Figure 18.18, judging from both the image and the label,
these consist exclusively of the desired category.
180

Figure 18.18.: Visualisation of the first 25 training data in the filtered CIFAR-100
dataset

Unfortunately, the two data sets could not be merged directly. In this case, the added
category would only have 500 images, while there are 5,000 images for each of the
other categories. As a result, this category would not be sufficiently trained and would
be recognised less well than the other categories. Now there are two possibilities. From
the 500 images, 5,000 images could be generated, but this is costly. The alternative
is to reduce the images of the other categories to 500. For this purpose, a similar
procedure is now also applied to the data from the CIFAR-10 dataset in order to
represent all categories equally in the new dataset.
X_train10_red = [None]*5000
y_train10_red = [None]*5000
for i in range(10):
idx = (y_train10 == i).reshape(X_train10.shape[0])
x=X_train10[idx]
y=y_train10[idx]
X_train10_red[i*500:i*500+500] = x[0:500]
y_train10_red[i*500:i*500+500] = y[0:500]

With the command concatenate from numpy, the two data sets, our modified versions
of CIFAR-10 and CIFAR-100, can now be linked.
X_train = np.concatenate((X_train10, X_train100))
y_train = np.concatenate((y_train10, y_train100))
Chapter 18. 181

The length of the two arrays is now 5500 entries. To see how the two datasets have
been merged, entries 4999 to 5023 of the newly created dataset can be viewed. In
Figure 18.19 it can be seen that the data of the modified dataset CIFAR-100 has been
appended to the data of the dataset CIFAR-10. In addition, the categories are now
also in order in the CIFAR-10 dataset that we reduced.

Figure 18.19.: Visualisation of the transition between the two data sets

To correct this, the created data set is shuffled. It is important that images and labels
are shuffled in the same way so that they still fit together. For this, the auxiliary
variable shuffler is defined, which specifies the way in which the two arrays are
shuffled randomly but both equally.

shuffler = np.random.permutation(len(X_train))

X_train = X_train[shuffler]

y_train = y_train[shuffler]

Then the same section of the dataset looks like in Figure 18.20.
182

Figure 18.20.: Visualisation of the transition generated data set

Finally, a OneHotVector representation is created from the labels:

y_train = tf.keras.utils.to_categorical(y_train)

The data is now in the following form:

print(X_train.shape)
print(y_train.shape)
print(y_train[1])

(5500, 32, 32, 3)


(5500, 11)
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

The same steps for modifying and merging the datasets must also be gone through
for the test data. The code for creating the dataset without the additional steps for
visualisation is shown below in the listing 18.21.
Chapter 18. 183

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# S c r i p t f o r c r e a t i n g a combined d a t a s e t o f CIFAR−10 and CIFAR−100


# A t t a c h i n g t h e c a t e g o r y b o t t l e s from CIFAR−100 t o CIFAR−10
# Written by C. Joachim
# January 2021

# In [ ] :

import numpy a s np
import t e n s o r f l o w a s t f
from t e n s o r f l o w . k e r a s import d a t a s e t s

# In [ ] :

#i m p o r t t h e CIFAR−10 and CIFAR−100 D a t a s e t s


( X_train10 , y _ t r a i n 1 0 ) , ( X_test10 , y _ t e s t 1 0 )
= t f . k e r a s . d a t a s e t s . c i f a r 1 0 . load_data ( )
( X_train100 , y _ t r a i n 1 0 0 ) , ( X_test100 , y _ t e s t 1 0 0 ) = t f . k e r a s . d a t a s e t s . c i f a r 1 0 0 . loa

# In [ ] :

#r e d u c e CIFAR−100 t o t h e c a t e g o r y ’ b o t t l e s ’ and change l a b e l


idx = ( y _ t r a i n 1 0 0 == 9 ) . r e s h a p e ( X_train100 . shape [ 0 ] )
X_train100 = X_train100 [ i d x ]
y_train100 = y_train100 [ idx ]
f o r i in range ( len ( y _ t r a i n 1 0 0 ) ) :
y _ t r a i n 1 0 0 [ i ]=10

idx = ( y _ t e s t 1 0 0 == 9 ) . r e s h a p e ( X_test100 . shape [ 0 ] )


X_test100 = X_test100 [ i d x ]
y_test100 = y_test100 [ idx ]
f o r i in range ( len ( y _ t e s t 1 0 0 ) ) :
y _ t e s t 1 0 0 [ i ]=10

# In [ ] :

#r e d u c e CIFAR−10 t o 500 t r a i n i n g and 100 t e s t images ,


#t o have t h e same number o f images f o r e v e r y c a t e g o r y
X_train10_red = [ None ] ∗ 5 0 0 0
y_train10_red = [ None ] ∗ 5 0 0 0

f o r i in range ( 1 0 ) :
i d x = ( y _ t r a i n 1 0 == i ) . r e s h a p e ( X_train10 . shape [ 0 ] )
x = X_train10 [ i d x ]
y = y_train10 [ idx ]
X_train10_red [ i ∗ 5 0 0 : i ∗500+500] = x [ 0 : 5 0 0 ]
y_train10_red [ i ∗ 5 0 0 : i ∗500+500] = y [ 0 : 5 0 0 ]

X_test10_red = [ None ] ∗ 1 0 0 0
y_test10_red = [ None ] ∗ 1 0 0 0

f o r i in range ( 1 0 ) :
i d x = ( y _ t e s t 1 0 == i ) . r e s h a p e ( X_test10 . shape [ 0 ] )
x = X_test10 [ i d x ]
184

18.8.4. Structure and Training of the Model


The training is to be carried out with the previously created data set, but also with the
complete CIFAR-10 data set for comparison purposes. In addition, different network
architectures – on the one hand the network known from the MNIST tutorial and
on the other hand the architecture of AlexNet – are to be used and different batch
sizes tested. Therefore, an input of the parameters is provided in the source code at
the beginning, where dataset=1 stands for the CIFAR-10 dataset and dataset=2
for the own dataset, model=1 for the MNIST model and model=2 for the AlexNet
architecture. 100 is used as the number of epochs and it is tested at individual points
whether the double number of epochs leads to a changed result. According to various
sources, it is advisable to first select 32 as the batch size and additionally try out
further powers of two (64, 128, 256).
In the first step, based on the selection of the data set, the images are loaded with the
corresponding labels. Then the model specified in the parameter selection can be built
and the data prepared accordingly. If model=1 was selected, a simple neural network
is built as described in the MNIST tutorial in [RA20] (on p.68). For this, the images
are normalised into the value range 0 to 1 in advance.
If model=2 was selected, AlexNet is built up as seen in [Ala08]. The prepara-
tion of the datasets is done a little more abbreviated than shown here because
if tf.data.Dataset.from_tensor_slices is used in the training, error messages
will occur with the versions and configurations used. Instead, the image data remain
in the tensor representation. These are normalised to a mean of zero and standard
deviation of 1 using the build-in function tf.image.per_image_standardisation
and resized to the AlexNet default size of × 227.
For model 1, compiling the model with optimiser and loss function is also done as
in the reference. For AlexNet, the optimiser tf.optimizers.SGD(lr=0.001) is also
used as in the reference. SGD stands for „stochastic gradient descent“. The parameter
lr specifies the learning rate, which determines in how large steps the weights are to
be adjusted along the gradient in each case.
The creation of the logfiles is done as before, only the file name is named according
to the parameters chosen at the beginning of the programme run in order to be able
to assign them afterwards. In the training itself, the only difference is that for model
1, the validation data for the training is split off using the command validation_-
split, while for model 2, the validation data was defined in advance according to the
template used (last 20 % of the training data) and can be specified in the training as
validation_data.
After completion of the training, the model is evaluated with the test data and the result
of this as well as the elapsed training time and the model itself, the latter as saved_-
model.pb, are saved, again specifying the selected parameters in the corresponding
folder name. The programme Training_CIFAR.py used for training as described can
be found in full in Listing 18.22.

18.8.5. History of the Training and Visualisation with TensorBoard


The courses of the training visualised with TensorBoard for the different selected
parameters can be seen in table 18.1 for the data set CIFAR-10 and in table 18.2 for
the modified data set.

Table 18.1.: Visualisation of training progress with the data set CIFAR10
Dataset 1 Model 1 Epochs 100 Batch Size 32
Chapter 18. 185

Accuracy Loss
Dataset 1 Model 2 Epochs 100 Batch Size 32

Accuracy Loss

Table 18.2.: Visualisation of the training courses with the modified data set
Dataset 2 Model 1 Epochs 100 Batch Size 32

Accuracy Loss
Dataset 2 Model 1 Epochs 200 Batch Size 32
186

Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 64

Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 128

Accuracy Loss
Dataset 2 Model 1 Epochs 100 Batch Size 256

Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 32
Chapter 18. 187

Accuracy Loss
Dataset 2 Model 2 Epochs 200 Batch Size 32

Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 64

Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 128

Accuracy Loss
Dataset 2 Model 2 Epochs 100 Batch Size 256
188

Accuracy Loss

Table 18.3 shows the values of the accuracy and loss function at the end of the training
for training, validation and the evaluation following the training with the test data.

Table 18.3.: Model quality and training time for different parameters: D-dataset,
M-model, E-epochs, B-batch size
Parameters training validation test
D M E B accuracy loss accuracy loss accuracy loss train.
time
1 1 100 32 0.9498 0.1522 0.7109 1.586 0.7066 1.567 65.1
1 2 100 32 0.9937 0.0219 0.7356 1.323 0.7316 1.363 1,366.2
2 1 100 32 0.9782 0.0647 0.5482 3.441 0.5367 3.306 7.4
2 1 200 32 0.9914 0.0299 0.5309 4.229 0.5224 4.350 14.8
2 1 100 64 0.9861 0.0460 0.5564 2.990 0.5427 3.157 6.1
2 1 100 128 0.9793 0.0663 0.5355 2.994 0.5392 3.060 5.6
2 1 100 256 0.9616 0.1003 0.5491 2.642 0.5543 2.602 5.2
2 2 100 32 0.9932 0.0413 0.5836 1.598 0.5582 1.730 152.3
2 2 200 32 0.9989 0.0094 0.6073 1.715 0.5578 1.954 303.0
2 2 100 64 0.9884 0.0548 0.5773 1.562 0.5603 1.679 151.9
2 2 100 128 0.9891 0.0489 0.6036 1.537 0.5602 1.686 151.4
2 2 100 256 0.9857 0.0570 0.5936 1.561 0.5569 1.727 154.6

As might be expected, the full CIFAR-10 dataset achieves higher accuracy when
trained with AlexNet, as it was trained with many more images. For the simpler
model, the small dataset seems to achieve high accuracy in training, but there seems
to be overfitting because the test accuracy for the small, modified dataset is much
lower than the training accuracy. At the same time, training with the full CIFAR-10
dataset on the AlexNet architecture requires a training time of almost a full day with
the specifications RAM 64 GB, Intel i9 9900k processor, 8 cores, 3600 MHz.
It can be seen that training with 100 epochs is absolutely sufficient. The results vary
slightly with the batch size, but no clear pattern can be seen here.
The accuracies achieved are not particularly high. This is not surprising, however,
when one considers the reduced training effort and makes a comparison with the
results achieved by the AlexNet developers with AlexNet. Here, too, despite the use of
over 1.2 million high-resolution images with the ImageNet dataset, a top-1 error rate
of 37.5 % was still observed. In this respect, the results achieved here with TensorFlow
with fewer low-resolution images are plausible if not satisfactory.

18.9. Sichern und Wiederherstellen eines Modells im


Format HDF5
WS:Was ist HDF5? In der Regel verwendet man zum Training eines neuronalen Netzes eine besonders
leistungsfähige Hardware. Nach Abschluss der Training können die Ergebnisse auf
andere Systeme genutzt werden, beispielsweise einem Jetson Nano. Daher ist es
notwendig, Mechanismen zu kennen, wir die Ergebnisse gesichert und wieder hergestellt
werden. In diesem Abschnitt wird, der Anleitung https://www.tensorflow.org/
tutorials/keras/save_and_load an einem einfahcen Beispiel dargestellt, wie ein
Modell gespeichert und wieder geladen wird. [Goo19b; Sta19]
WS:Quellen Es gibt verschiedene Formate, Modelle abzuspeichern [Onn; Jun+19; Fol+11; Hdf] .
WS:Ist die richtig? Hier verfolgen wir das Format HDF5, da es von TensorFlow unterstützt wird.
Chapter 18. 189

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# S c r i p t f o r T r a i n i n g w i t h s t a n d a r d and m a n i p u l a t e d CIFAR10 d a t a s e t
# ( s e e S c r i p t ’ D a t a s e t . py ’ ) w i t h a s i m p l e CNN and AlexNet
#
# Written by C. Joachim b a s e d on d i f f e r e n t s o u r c e s ( s e e r e f e r e n c e s i n code )
# January 2021
#
# I f t h e m a n i p u l a t e d d a t a s e t i s t o be used , run t h i s s c r i p t i n advance
# such t h a t t h e d a t a s e t can be l o a d e d

# In [ ] :

import t e n s o r f l o w a s t f
import time
from t e n s o r f l o w import k e r a s

# In [ ] :

#Choose parameters f o r t r a i n i n g
Dataset = 1 #D a t a s e t=1 f o r s t a n d a r d CIFAR10 , D a t a s e t=2 f o r own d a t a s e t
Model = 1 #Model=1 f o r s i m p l e CNN, Model=2 f o r AlexNet
Epochs = 100 #c h o o s e number o f e p o c h s
Batch = 256 #c h o o s e Batch S i z e ( s t a n d a r d i s 32 , 64 , 128 , 256)

# In [ ] :

#i m p o r t d a t a s e t
i f D a t a s e t ==1:
#l o a d CIFAR10
from t e n s o r f l o w . k e r a s import d a t a s e t s
( X_train , y _ t r a i n ) , ( X_test , y _ t e s t ) = t f . k e r a s . d a t a s e t s . c i f a r 1 0 . load_data ( )
#10 C a t e g o r i e s −−> 10 o u t p u t neurons needed
c a t = 10
i f D a t a s e t ==2:
#l o a d m a n i p u l a t e d CIFAR10 d a t a s e t ( a l r e a d y n o r m a l i z e d and OneHot−encoded )
get_ipython ( ) . run_line_magic ( ’ s t o r e ’ , ’−r ’ )
( X_train , y_train , X_test , y _ t e s t ) = Data_CIFAR
#11 C a t e g o r i e s −−> 11 o u t p u t neurons needed
c a t = 11

# In [ ] :

i f Model==1:
#use Model from MNIST−T u t o r i a l from ’ c t Python−P r o j e k t e by H e i s e
#w i t h m o d i f i e d i n p u t s i z e and a d j u s t a b l e number o f o u t p u t neurons
#
#i f t h e s i m p l e CNN i s chosen , n o r m a l i z e d a t a
X_train = X_train /255
X_test = X_test /255
#i f t h e D a t a s e t i s o r i g i n a l CIFAR−10 , we s t i l l need t o
#c o n v e r t t o OneHot−Vector
i f D a t a s e t ==1:
y_train = t f . keras . u t i l s . t o _ c a t e g o r i c a l ( y_train )
y_test = t f . keras . u t i l s . t o _ c a t e g o r i c a l ( y_test )
190

18.9.1. Notwendige Bibiotheken


Um Modelle im Format HDF5 speichern zu können, ist die Bibiothek h5py und der
Zugriff auf Dateien notwendig. Die Bildiothek muss mit dem Befehl
pip install -q pyyaml h5py
instlliert werden. Im Python-Programm muss man den Zugriff auf das Dateisystem
ermöglichen und den aus dem Paket TensorFlow das Teilpaket keras bereitstellen.
import os
import tensorflow as tf
from tensorflow import keras

18.9.2. Speichern während des Trainings


Zwischenergebnisse können während des Training gesichert werden. Dies kann nützlich
sein, falls man ein Training abbrechen muss. Wenn dann das Training fortgesetzt
wird, können die bereits erreichten Ergebnisse genutzt werden, um die Rechenzeit
abzukürzen.
Zur Zwischenspeicherung müssen Callback-Funktionen erstellt werden. An speziell
definierten Stellen im Prgramm, so genannte Prüfpunkte, werden dann die Gewichte
in eine Sammlung von Checkpoint-Dateien gespeichert. Diese werden am Ende jeder
Epoche aktulisiert. Zunächst muss eine Checkpoint-Datei angelegt:
checkpoint_path = "training_1/cp.ckpt"
WS:Datei cp fest? checkpoint_dir = os.path.dirname(checkpoint_path)
Es ist auch möglich, die Kontrollpunkte eindeutig anhand der Epoche zu benennen,
dann ändert sich die Eingabe zu
checkpoint_path = "training_2/cp-epoch:04d.ckpt"
S:Wie sieht ein Beispiel checkpoint_dir = os.path.dirname(checkpoint_path)
aus?
Danach wird die Callback-Funktion defniert. Die Übergabeparameter sind fest
vorgegeben, der Inhalt ist entsprechend anzupassen.
# Create a callback that saves the model’s weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
Sollen die Gewichte nur alle beispielsweise 5 Epochen gespeichert werden, wird der
Callback wie folgt angelegt:
# Create a callback that saves the model’s weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# Save the weights using the ’checkpoint_path’ format
model.save_weights(checkpoint_path.format(epoch=0))
Um die Gewichte im Training zu speichern, muss die Callback-Funktion dann noch
der Funktion model.fit() übergeben werden:
# Train the model with the new callback
model.fit(train_images,
train_labels,
Chapter 18. 191

epochs=10,
validation_data=(test_images, test_labels),
callbacks=[cp_callback]) # Pass callback to training

Es kann sein, dass auf diesem Wege eine Warnmeldung generiert wird, die aber ignoriert
werden kann. WS:Welche
Warnumeldung
Zwischen zwei Modellen mit derselben Architektur können die Gewichte mit Hilfe
der erstellten Checkpoint-Datei geteilt werden. Sei model ein zweites Modell mit der
gleichen Architektur wie das Modell, deren Gewichte gespeichert wurden, lädt man
die Gewichte aus dem Checkpoint wie folgt in das neue Modell:

model.load_weights(checkpoint_path)

18.9.3. Speichern von Gewichten


Falls die Gewichte eines Modells berechnet wurden, können sie abschließend gespeichert
werden. Dazu kann der folgende Befehl verwendet werden.

model.save_weights(’./checkpoints/my_checkpoint’)

Standardmäßig wird so mit der Erweiterung .ckpt gespeichert.

18.9.4. Speichern des gesamten Modells


Über die Funktion model.save können Architektur, Gewichte und Trainingskon-
figuration eines Modells in einer einzigen Datei gespeichert werden. Es gibt zwei
Dateiformate, um ein gesamtes Modell zu speichern. Standard in TensorFlow ist
SavedModel, ein proprietäres Dateiformat. Es ist aber auch möglich, Modelle im
Format HDF5 zu speichern. WS:Was ist SavedMode

SavedModel-Format

Das Speichern im Format SavedModel erfolgt in nur zwei Zeilen.

!mkdir -p saved_model
model.save(’saved_model/my_model’)

Das Speichern eines Modells ist nur dann sinnvoll, wenn es später genutzt werden
kann. Das Modell kann aus dem gespeicherten Modell neu geladen werden:

new_model = tf.keras.models.load_model(’saved_model/my_model’)

Format HDF5

Um das Modell im Format HDF5 zu speichern, wird die entsprechende Endung h5


angegeben. Es ist nur eine Zeile notwendig:

model.save(’my_model.h5’)

Geladen wird das Modell entsprechend mit dem Dateinamen inklusive der Endung h5:

new_model = tf.keras.models.load_model(’my_model.h5’)
WS:Wie wird ein Mode
genutzt?
192

18.10. Nutzung eines Modells


WS:Beispielprogramm
18.11. Optimierung der Trainingszeit
Werden die neuronalen Netze größer und komplexer, so erhöhen sich die Trainigszeiten.
Daher ist es sinnvoll, sich um die Reduzierung der Trainigszeiten zu bemühen. Dazu
stehen verschiedene Ansatzpunkte zur Verfügung:

1. tf.data

2. Mixed Precision Training

3. Multi-GPU Training Strategy

Insgesamt kann die Trainingszeit um den Faktor 2 - 10 beschleunigt werden. [Meu20]

18.11.1. Engpass identifizieren

Bei Verwendung einer Grafikkarte der Firma NVIDIA ist die einfachste Lösung zur
Überwachung der GPU-Auslastung über die Zeit wahrscheinlich das Werkzeug nvtop.
Eine mögliche Ausgabe ist in Abbildung 18.21 dargestellt.

Figure 18.21.: Visualisierung von Engpässung mit nvtop Quelle:[Meu20]

Zur Visualisierung der Auslastung mit TensorBoard muss

profile_batch={BATCH_INDEX_TO_MONITOR}

im TensorBoard als Callback-Funktion hinzugefügt werden. Dann wird ein vollständiger


Bericht über die Operationen erstellt, die entweder von der CPU oder der GPU für
den gegebenen Batch durchgeführt wurden. Dies kann dabei helfen, zu erkennen, ob
der Prozessor an einem bestimmten Punkt aufgrund fehlender Daten blockiert ist.
Chapter 18. 193

Figure 18.22.: Visualisierung von Engpässung mit TensorBoard Quelle:[Meu20]

18.11.2. tf.data
Um die GPU durchgängig arbeiten zu lassen, muss der Engpass des Ladens der Daten WS:Hilft dies auch bei
beseitigt werden. In Abbildung 18.23 ist zu erkennen, dass die GPU nicht arbeitet, einer CPU?
während die CPU mit dem Laden von Daten beschäftigt ist.

Figure 18.23.: Bottleneck Daten laden Quelle:[Meu20]

Als erstes sollte zur Beseitigung dieses Engpasses von Keras sequences zu tf.data
gewechselt werden. Danach können weitere Optimierungen vorgenommen werden:

1. Parallelisierung: Mit num_parallel_callst̄f.data.experimental.AUTOTUNE


werden die Aufrufe der Funktion .map() parallelisiert
2. Cache: Der Cache kann verwendet werden , um geladene Bilder im Speicher
zu halten. Dies kann durch das Zwischenspeichern von Datensätzen vor der
194

Patch-Auswahl geschehen.

3. Prefetching: Es werden Elementen abgerufen, bevor der vorherige Batch beendet


wurde.

Das nachfolgende Beispiel ist ein optimierter Code zur Datensatzerstellung mit scharfen
und verschwommenen Bildern zu sehen:

18.11.3. Mixed Precision Training


Standardmäßig werden alle Variablen als float32 gespeichert, also mit 32Bit. Mixed
Precision Training beruht auf der Einsicht, dass diese hohe Präzision nicht immer
benötigt wird, sodass an einigen Stellen auch 16 Bit genutzt werden können. Die
Gewichte selber werden weiterhin als float32 Version gespeichert, Vorwärts- und
Rückwärtspropagation nutzen aber die float16-Elemente. Die Genauigkeit wird so
nicht verschlechtert, in einigen Fällen sogar erhöht.
Das Mixed Precision Training kann in TensorFlow Versionen ab 2.1.0 leicht implemen-
tiert werden. Es kann mit diesen beiden Zeilen vor der Modellinstanziierung aktiviert
werden:
policy = tf.keras.mixed_precision.experimental.Policy(’mixed_float16’)
tf.keras.mixed_precision.experimental.set_policy(policy)

18.11.4. Multi-GPU-Training
Der einfachste Weg, Multi-GPU-Training durchzuführen, ist die Verwendung der Mir-
roredStrategy. Sie instanziiert das Modell auf jeder GPU. Bei jedem Schritt werden
WS:Was ist Backward verschiedene Batches an die GPUs gesendet, die den Backward Pass ausführen. Dann
Pass? werden die Gradienten aggregiert, um eine Aktualisierung der Gewichte durchzuführen,
und die aktualisierten Werte werden an jedes instanziierte Modell propagiert.
Die Verteilungsstrategie ist mit TensorFlow 2.0 wieder recht einfach. Es ist nur daran
zu denken, dass die übliche Batchgröße mit der Anzahl der verfügbaren GPUs zu
multiplizieren:
# Define multi-gpu strategy
mirrored_strategy = tf.distribute.MirroredStrategy()
# Update batch size value
batch_size *= mirrored_strategy.num_replicas_in_sync
# Create strategy scope to perform training
with mirrored_strategy.scope():
model = [...]
model.fit(...)

18.12. Optimierung Multithreading


Das Multithreading von Tensorflow kann zur Optimierung genutzt werden.
WS:Neu formulieren

• tf.config.threading:
– set_intra_op_parallelism_threads: Legt die Anzahl der Threads fest,
die innerhalb einer einzelnen Operation für Parallelität verwendet werden.
– set_inter_op_parallelism_threads: Legt die Anzahl der Threads fest,
die für die Parallelität zwischen unabhängigen Vorgängen verwendet werden.

• OpenMP runtime (aktiviert via os.environ[. . . ]):


Chapter 18. 195

#d a t a s e t c r e a t i o n f o r o p t i m i z i n g t r a i n i n g time
#from h t t p s : / /www. k d n u g g e t s . com/2020/03/ t e n s o r f l o w −o p t i m i z i n g −t r a i n i n g −tim

from p a t h l i b import Path

import t e n s o r f l o w a s t f

def s e l e c t _ p a t c h ( sharp , b l u r , patch_size_x , patch_size_y ) :


"""
S e l e c t a p a t c h on b o t h s h a r p and b l u r images a t t h e same l o c a l i z a t i o n .
Args :
s h a r p ( t f . Tensor ) : Tensor f o r t h e s h a r p image
b l u r ( t f . Tensor ) : Tensor f o r t h e b l u r image
patch_size_x ( i n t ) : Size of patch along x a x i s
patch_size_y ( i n t ) : S i z e of patch along y a x i s
Returns :
Tuple [ t f . Tensor , t f . Tensor ] : Tuple o f t e n s o r s w i t h s ha p e ( patch_size_x , p
"""
s t a c k = t f . s t a c k ( [ sharp , b l u r ] , a x i s =0)
p a t c h e s = t f . image . random_crop ( s t a c k , s i z e =[2 , patch_size_x , patch_size_y
return ( p a t c h e s [ 0 ] , p a t c h e s [ 1 ] )

class TensorflowDatasetLoader :
def __init__ ( s e l f , dataset_path , b a t c h _ s i z e =4, p a t c h _ s i z e =(256 , 2 5 6 ) , n_e
# L i s t a l l images p a t h s
sharp_images_paths = [ s t r ( path ) f o r path in Path ( dataset_path ) . g l o b ( " ∗/ s h
i f n_images i s not None :
sharp_images_paths = sharp_images_paths [ 0 : n_images ]

# Generate c o r r e s p o n d i n g b l u r r e d images p a t h s
blur_images_paths = [ path . r e p l a c e ( " s h a r p " , " b l u r " ) f o r path in sharp_imag

# Load s h a r p and b l u r r e d images


s h a r p _ d a t a s e t = t f . data . D a t a s e t . f r o m _ t e n s o r _ s l i c e s ( sharp_images_paths ) .ma
lambda path : s e l f . load_image ( path , dtype ) ,
n u m _ p a r a l l e l _ c a l l s=t f . data . e x p e r i m e n t a l .AUTOTUNE,
)
b l u r _ d a t a s e t = t f . data . D a t a s e t . f r o m _ t e n s o r _ s l i c e s ( blur_images_paths ) .map(
lambda path : s e l f . load_image ( path , dtype ) ,
n u m _ p a r a l l e l _ c a l l s=t f . data . e x p e r i m e n t a l .AUTOTUNE,
)

d a t a s e t = t f . data . D a t a s e t . zip ( ( sharp_dataset , b l u r _ d a t a s e t ) )


dataset = dataset . cache ( )

# S e l e c t t h e same p a t c h on t h e s h a r p image and i t s c o r r e s p o n d i n g b l u r r e d


d a t a s e t = d a t a s e t .map(
lambda sharp_image , blur_image : s e l e c t _ p a t c h (
sharp_image , blur_image , p a t c h _ s i z e [ 0 ] , p a t c h _ s i z e [ 1 ]
),
n u m _ p a r a l l e l _ c a l l s=t f . data . e x p e r i m e n t a l .AUTOTUNE,
)

# D e f i n e d a t a s e t c h a r a c t e r i s t i c s ( b a t c h _ s i z e , number_of_epochs , s h u f f l i n g
d a t a s e t = d a t a s e t . batch ( b a t c h _ s i z e )
d a t a s e t = d a t a s e t . s h u f f l e ( b u f f e r _ s i z e =50)
dataset = dataset . repeat ()
d a t a s e t = d a t a s e t . p r e f e t c h ( b u f f e r _ s i z e=t f . data . e x p e r i m e n t a l .AUTOTUNE)

s e l f . dataset = dataset

@staticmethod
196

– OMP_NUM_THREADS: Anzahl der für OpenMP verfügbaren Threads.


– KMP_BLOCKTIME: Die Zeit eines Spinlocks, gemessen in Millisekunden
– KMP_AFFINITY: Definieren Sie die Zuordnung von Threads zu den Hard-
ware-Ressourcen.

Der Wert OMP_NUM_THREADS wird auf die Anzahl der verfügbaren Kernen und der
Parameter KMP_BLOCKTIME wird auf einen kleineren Wert als das größere Netz gesetzt.

Festlegung der Affinität:


os.environ["KMP_AFFINITY"]="granularity=fine,compact,1,0,verbose"
19. TensorFlow Lite
19.1. TensorFlow Lite
“TensorFlow Lite is an open-source, product ready, cross-platform deep learning
framework that converts a pre-trained model in TensorFlow to a special format that can
be optimized for speed or storage” [Kha20].The special format model can be deployed
on edge devices like mobiles,embedded devices or Microcontrollers. by quantization
and weight pruning TensorFlow Lite achieves optimization and it reduces the size of
the model or improves the latency.Figure ?? shows the workflow of TensorFlow Lite.

19.2. What is TensorFlow Lite?


Often the trained models are not to be used on the PC on which they were trained,
but on mobile devices such as mobile phones or microcontrollers or other embedded
systems. However, limited computing and storage capacities are available there.
TensorFlow Lite is a whole framework that allows the conversion of models into a
special format and their optimisation in terms of size and speed, so that the models
can then be run with TensorFlow Lite interpreters on mobile, embedded and IoT
devices. [Goo20a][WS20]
TensorFlow Lite is a „light“ version of TensorFlow designed for mobile and embedded WS:Aufbau des Kapitel
devices. It achieves low-latency inference in a small binary size - both the TensorFlow
Lite models are much smaller. However, you cannot train a model directly with
TensorFlow Lite, it must first be created with TensorFlo, then you must convert the
model from a TensorFlow file to a TensorFlow Lite file using the TensorFlow Lite
converter.[Goo11b] The reason is that TensorFlow models usually calculate in 32-bit WS:Einführung von
floating point numbers for the neural networks. The weights can assume very small TensorFlow
values close to zero. Therefore, the models cannot be run on systems that cannot Lite-Interpreter und
calculate with long floating point numbers, but only with 8-bit integers. This is -Konverter?
especially the case with small AI processors, on which only a few transistors can be
accommodated due to size and power consumption. Therefore, the neural network
must undergo a transformation, „in which long floating-point numbers with varying
precision over the representable range of values – floating-point numbers represent the
range of numbers around the zero point much more finely than very large or small
values – become short integers with constant precision in a limited range of values
“[RA20]. This quantisation takes place during the conversion to the TensorFlow Lite
format.

19.3. Procedure
The process for using TensorFlow Lite consists of four steps:
1. Choice of a model:
An existing, trained model can be adopted or trained yourself, or a model can
be created from scratch and trained.
2. Conversion of the model:
If a custom model is used that is accordingly not in TensorFlow Lite format, it
must be converted to the format using the TensorFlow Lite converter.

197
198

Train a Model

Convert the Model

Optimize the model Microcontrollers

Deploy the model at edge Embedded Devices

Make inferences at the edge Mobile Devices

Figure 19.1.: Tensorflow Lite workflow [Kha20]

3. deployment on the end device:


To run the model on the device, the TensorFlow Lite interpreter is available
with APIs in many languages.

4. Optimisation of the model:


Other optimisation tools are available to improve model size and execution speed.
For example, the model can be quantised by converting 32-bit floating point
numbers to more efficient 8-bit integers. With the TensorFlow Lite converter,
TensorFlow models are converted into an optimised FlatBuffer format so that
they can be used by the TensorFlow Lite interpreter. [Goo19c]

The figure 19.2 illustrates the basic process for creating a model that can be used on
an edge computer. Most of the process uses standard TensorFlow tools.

TensorFlow
Definition Training Quanti- Export Modell im
Trainiertes Trainiertes
eines Datei-
32-Bit-Modell 8-Bit-Modell
32-Bit-Modells sierung format .pb

Konver-
Edge Computer tierung

Edge Deployment Import Modell


Trainiertes
Computer im Datei-
8-Bit-Modell
Hardware format .tflite

Figure 19.2.: The basic process for creating a model for an edge computer[Goo19a]

19.4. TensorFlow Lite Converter


The TensorFlow Lite Converter is a tool available as a Python API that converts
trained TensorFlow models into the TensorFlow Lite format. This is done by converting
the Keras model into the form of a FlatBuffer, a special space-saving file format. In
Chapter 19. 199

general, converting models reduces file size and introduces optimisations without
compromising accuracy. The TensorFlow Lite converter continues to offer options to
further reduce file size and increase execution speed, with some trade-offs in accuracy.
[Goo20a][WS20]
One way to optimise is quantisation. While the weights of the models are usually
stored as 32-bit floating point numbers, they can be converted to 8-bit integers with
only a minimal loss in the accuracy of the model. This saves memory and also allows
for faster computation, as a CPU works more efficiently with integers. [WS20]

19.5. TensorFlow Lite Interpreter


The TensorFlow Lite interpreter is a library that executes an appropriately converted
TensorFlow Lite model using the most efficient operations for the given device, applying
the operations defined in the model to the input and providing access to the output.
It works across platforms and provides a simple API to run TensorFlow Lite models
from Java, Swift, Objective-C, C ++ and Python.[Goo20a][WS20]
To use hardware acceleration on different devices, the TensorFlow Lite interpreter can
be configured with „delegates“. These use device accelerators such as an GPU or a
digital signal processor (DSP). [Goo20a]

19.6. TensorFlow Lite and Jetson Nano


19.6.1. Compatibility
Compared to other AI systems such as the Edge TPU, NVIDIA’s Jetson Nano is more
flexible in terms of model properties. As [SC21] shows, 16- and 32-bit models can
be used on the Jetson Nano. Therefore, there are many different model formats that
are compatible with the Jetson Nano. Often models are converted with Keras in the
format HDF5 or with TensorFlow as a protocol buffer file with the extension .pb to
a file in the format originally supported by TensorRT with the extension .uff to be
used on the Jetson Nano. TensorFlow Lite is optimised for use on ARM CPU edge
devices [Jai+20]. Since the Jetson Nano also uses an ARM Cortex-A57 processor,
using TensorFlow Lite on the Jetson Nano is an obvious choice.

19.6.2. Conversion of the model


It is recommended to convert a model in SavedModel format. After training, the
model should therefore be saved accordingly. The following steps can then be carried
out:
import tensorflow as tf
# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
# with the path to the SavedModel directory
tflite_model = converter.convert()
# Save the model.
with open(’model.tflite’, ’wb’) as f:
f.write(tflite_model)

This converts the saved model into the format tflite and saves it in the specified
location with the specified name. [Goo11a] The flag ’wb’ merely specifies that it is
written in binary mode.
In addition, this model can now be optimised as already described. For this purpose,
the optimisation must be specified [WS20] before the actual conversion:
200

converter.optimizations = [tf.lite.Optimize.DEFAULT]

In addition to the DEFAULT variant, there are also options to specifically optimise the
size or latency. However, according to [Ten21], these are outdated, so DEFAULT should
be used.

19.6.3. Transfer of the model


A convenient way to transfer models and other large files between a PC and the Jetson
Nano is to set up a Secure Shell FileSystem (SSHFS). This allows access to the Jetson
Nano’s folders from the PC. The installation of such a system for Windows is explained
herehttps://www.digikey.com/en/maker/projects/getting-started-with-the-nvidia-jetson-nano-part-1-setup/2f497
WS:replace link with a .
quote and explanation First, the latest version of the programme winfsp must be downloaded:
WS:Was macht winfsp?
https://github.com/billziss-gh/winfsp/releases
WS:clone ....
Then the programme SSHFS-Win is installed:
https://github.com/billziss-gh/sshfs-win/releases
WS:clone ....
Then you can open the Explorer/File Manager in Windows. After right-clicking on
’This PC’, select ’Add network address’. The following must be entered as the network
address:
\\sshfs \<username>@<IP address>

Where <username> is the user name used on the Jetson nano and <IP address> is
the IP address. The IP address can be viewed in a terminal on the Jetson nano using
the command
$ ip addr show

display.
WS:Wohin kopieren?

19.6.4. Using a model on Jetson Nano


The TensorFlow Lite model can be used in different environments. On the Jetson
Nano, it is best to use the Python environment. To use the TensorFlow Lite interpreter,
TensorFlow must be installed on the Jetson Nano. The installation of TensorFlow on
WS:Where? the Jetson Nano is described in the chapter ??.
At the beginning of a program that is to evaluate a model in tflite format, TensorFlow
must be imported. Then the model should be loaded by specifying the memory path
and the tensors should be associated:
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()

Then the shape/size of the input and output tensors can be determined so that
subsequently the data can be adjusted accordingly.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

With this information, the input data, in this example of ?? a random array, can be
adjusted and set as the input data:
input_shape = input_details[0][’shape’]
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0][’index’], input_data)
Chapter 19. 201

Then the interpreter is executed and determines the result of the model
interpreter.invoke()

Finally, the output of the model can be determined and read out:
output_data = interpreter.get_tensor(output_details[0][’index’])

If the data is to be further processed or displayed, additional libraries, for example


numpy, must be loaded.

19.7. Application example: tflite model for bottle


detection with a combined data set from CIFAR10
and CIFAR100
The model trained in ?? with a combined dataset of CIFAR-10 and CIFAR-100 shall
be transformed into .tflite format and used on the Jetson Nano for image classification.
This is intended to provide an example of a possible application of a network trained
on the work PC on an edge computer in the field to enable detection of bottles located
in front of the camera. To do this, the previously trained model must be converted to
.tflite for use on the Jetson Nano.

19.7.1. Conversion to format tflite


The selected model is converted into the tflite format with the tflite converter. For this
purpose, the two following programmes, Conversion.py and Conversion_Opt.py
were created and run on the work PC. In these programmes, the model can be selected
based on the parameters used in the training (dataset, model, epochs and batch
size) and then converted to .tflite format with optimisation (Conversion_Opt.py) or
without.
The conversion with optimisation reduces the size of the file in tflife format, as can be
seen in table 19.1, by about 75%. The speed is also supposed to be increased by the
optimisation with a slight loss of accuracy. Whether this is confirmed will be clarified
in the following tests, among others.

Table 19.1.: File sizes of the different models/ formats


Parameter .pb .tflite
With optimisation Without optimisation
Dataset=2, Model=2,
452 kB 225 MB 56.4 MB
Epochs=100, Batch Size=32
Dataset=1, Model=2,
452 kB 225 MB 56.4 MB
Epochs=100, Batch Size=32

19.7.2. Test with any images


An indication of the accuracy in terms of the correct classifications of the models with
the images from the dataset CIFAR-10 and a combination of the datasets CIFAR-10
and CIFAR-100 are the validation and test accuracy as well as the corresponding
losses.
Since in a real application it would hardly be possible to use such carefully selected
images of uniform format and resolution, the performance of the models is to be tested
202

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# S c r i p t f o r c o n v e r t i n g a t r a i n e d model ( s e e S c r i p t Training_CIFAR . py )
# t o a TensorFlow−L i t e −Model w i t h o u t o p t i m i z a t i o n
#
# Based on h t t p s : / /www. t e n s o r f l o w . o r g / api_docs / python / t f / l i t e / TFLiteConverter

# In [ 2 ] :

import t e n s o r f l o w a s t f

# In [ 1 ] :

#Choose p a r a m e t e r s o f t h e t r a i n e d model t o be c o n v e r t e d ( s e e s c r i p t Training_CIFA


Dataset = 2
Model = 2
Epochs = 100
Batch = 32

# In [ 3 ] :

#Convert t h e model
converter = t f . l i t e . TFLiteConverter . from_saved_model ( " saved_model / "+" D a t a s e t ␣%
tflite_model = converter . convert ()

# In [ 4 ] :

#Save t h e model .
with open ( ’ model1 . t f l i t e ’ , ’wb ’ ) a s f :
f . write ( tflite_model )

# In [ ] :

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# S c r i p t f o r c o n v e r t i n g a t r a i n e d model ( s e e S c r i p t Training_CIFAR . py )
# t o a TensorFlow−L i t e −Model w i t h o p t i m i z a t i o n
#
# Based on h t t p s : / /www. t e n s o r f l o w . o r g / api_docs / python / t f / l i t e / TFLiteConverter

# In [ ] :

import t e n s o r f l o w a s t f

# In [ ] :

#Choose p a r a m e t e r s o f t h e t r a i n e d model t o be c o n v e r t e d ( s e e s c r i p t Training_CIFA


Dataset = 2
Model = 2
Epochs = 100
Chapter 19. 203

under conditions closer to the application using a few more or less arbitrary images
selected from the Google Image Search for the various categories.

Five photos in format .jpg or .jpeg will be selected from the internet and, to a small
extent, from the photos contained in the project jetson-inference (https://github.
com/dusty-nv/jetson-inference), per category. Care is taken to ensure that the WS:correct source
images are not too similar so that the results are approximately representative and
meaningful despite the small number of images per category.

The selected images are shown in figure ?? and a reference to their source to facilitate
the search for comparable photos, the link to the image in the Google Image Search in
table 19.2.
204

airplane0 airplane1 airplane2 airplane3 airplane4

auto0 auto1 auto2 auto3 auto4

bird0 bird1 bird2 bird3 bird4

cat0 cat1 cat2 cat3 cat4

deer0 deer1 deer2 deer3 deer4

dog0 dog1 dog2 dog3 dog4

frog0 frog1 frog2 frog3 frog4

horse0 horse1 horse2 horse3 horse4

ship0 ship1 ship2 ship3 ship4

truck0 truck1 truck2 truck3 truck4

bottle0 bottle1 bottle2 bottle3 bottle4

Figure 19.3.: Testbilder


Chapter 19. 205

Table 19.2.: Bildquellen


airplane0.jpg https://images.app.goo.gl/miFysGdgzfattLYS8
airplane1.jpg https://images.app.goo.gl/21sxybSv9tgDv7pv9
airplane2.jpg https://images.app.goo.gl/gHEDpv2jqmgoUSEcA
airplane3.jpg https://images.app.goo.gl/Qc3iygkmB4qxW5eF8
airplane4.jpg https://images.app.goo.gl/jCnZcfBZndtGJkR38
auto0.jpg https://images.app.goo.gl/uzCTYpdSBksw57uv5
auto1.jpg https://images.app.goo.gl/hU6kNTPBJ5PnZq4Y6
auto2.jpg https://images.app.goo.gl/Gkfge72thV9R4oGx7
auto3.jpg https://images.app.goo.gl/1SCXdXxuXPCD6VYR8
auto4.jpg https://images.app.goo.gl/U9R26N43Tvn6jfXV6
bird0.jpg https://images.app.goo.gl/9oV7uhCzQtHbJYMf6
bird1.jpg https://images.app.goo.gl/ThFr8msmptb1cFpe9
bird2.jpg https://images.app.goo.gl/JhjHuVZcFGt5EvNb6
bird3.jpg https://images.app.goo.gl/g3vPKULecoVKoJC2A
bird4.jpg https://images.app.goo.gl/LKCTmDhy23WzMUjt7
cat0.jpg jetson inference
cat1.jpg jetson inference
cat2.jpg https://images.app.goo.gl/fg41HZ1oNSPyeu649
cat3.jpg jetson inference
cat4.jpg https://images.app.goo.gl/44PPHu2T88jUupBJ8
deer0.jpg https://images.app.goo.gl/XS56GYx4YWMKCjU8A
deer1.jpg https://images.app.goo.gl/uSjwNHSzS5VF7Xz29
deer2.jpg https://images.app.goo.gl/H6mwvVxDQfDb51vC6
deer3.jpeg https://images.app.goo.gl/ADDGt6FwvFAb5STM6
deer4.jpg https://images.app.goo.gl/X9m7Apd7Na8iBy4u8
dog0.jpg jetson-inference
dog1.jpg https://images.app.goo.gl/BQm3FtiCe2yAoWPi7
dog2.jpg https://images.app.goo.gl/T1ZR6WCjJpHGNiAC7
dog3.jpg https://images.app.goo.gl/wU5MWm2YbHeGZWHW9
dog4.jpg https://images.app.goo.gl/wiEyEo2mz5LrMwKQ8
frog0.jpg https://images.app.goo.gl/Dggsw2tvTQYBXBG29
frog1.jpg https://images.app.goo.gl/kxED4M2X9MBovfim6
frog2.jpg https://images.app.goo.gl/3BV8hN2XPpA5KYko6
frog3.jpg https://images.app.goo.gl/mEBLEcVzQezfLfy88
frog4.jpg https://images.app.goo.gl/gpbxX7rmnHeygMxeA
horse0.jpg jetson inference
horse1.jpg jetson inference
horse2.jpg https://images.app.goo.gl/iwfVuHDnZEDw5TqF6
horse3.jpg https://images.app.goo.gl/vUhQ3etEkPoRW96H8
horse4.jpg https://images.app.goo.gl/vQ3spHAZrvFL4B1z8
ship0.jpg https://images.app.goo.gl/cYgT3Z7F3Q1KWhtn7
ship1.jpg https://images.app.goo.gl/VRiqLzXymnGUqAgJ8
ship2.jpg https://images.app.goo.gl/RR8QVHgFi8XVPj67A
ship3.jpeg https://images.app.goo.gl/YHgCWbgY7FrN853u5
ship4.jpeg https://images.app.goo.gl/acCxpBFX9tuoM3TP9
truck0.jpg https://images.app.goo.gl/bTHwRb7nSxJw7fNu8
truck1.jpg https://images.app.goo.gl/jEeaSrjdpCwaX3Bn7
truck2.jpg https://images.app.goo.gl/u37AQen813uGTKpeA
truck3.jpg https://images.app.goo.gl/wxwHVmRyeWzMmMhH7
truck4.jpg https://images.app.goo.gl/mXVCZz1PiFKjyrQQ7
bottle0.jpg https://images.app.goo.gl/6QNxuQQ8VMYQbCf8A
bottle1.jpg https://images.app.goo.gl/gEjRtAHgwwShqcD28
bottle2.jpg https://images.app.goo.gl/YaKWZUq3SjWoWC1K6
206

bottle3.jpg https://images.app.goo.gl/wPaB7Q7u9DTbzhd38
bottle4.jpg https://images.app.goo.gl/HLKGVanQACtDxxmm8

With the selected images, the trained model is first tested on the work PC in the
Jupyter notebook to determine the performance of the original model with respect
to the test images as reference. The script for this is test_original.py. Based on
the selection of the model parameters (dataset, model, epochs and batch size; see
WS:ref listing missing script training_CIFAR.py), this first reads in the model to be evaluated. Then the
selected test pattern is loaded. This must be normalised and scaled in the same way
as in the training. Therefore, the build-in functions of TensorFlow are used again
for this (tf.image.per_image_standardization and (tf.image.resize). Finally,
the model can be run and the result evaluated. The category with the maximum
probability and the corresponding description are determined.
After the test on the work PC, the models converted to tflite format are used on the
Jetson Nano. The programme tflite-photo.py works similarly to the previous one:
The selected .tflite model is loaded, as well as the selected image, which is normalised
and resized in the same way. in the same way. Before that, however, the TensorFlow
Lite interpreter is initialised and the tensors are allocated. In addition, the necessary
sizes of the input and output tensors are determined. After the input image has
been prepared and defined as the input tensor, the model is executed including a
measurement of the computing time. Afterwards, the result can be evaluated as usual.
In this case, the result is also superimposed on the input image and the resulting
result image is stored in a separate folder.

19.7.3. Test results


The results of the tests of the different models described above are recorded below.
Models were tested with the parametersdataset=1, model=2, epochs=100, batch
size=32 and dataset=2, model=2, epochs=100, batch size=32, where dataset
1 corresponds to the original dataset CIFAR-10 and dataset 2 corresponds to the
self-created dataset from datasets CIFAR-10 with the added category “bottles” from
dataset CIFAR-100, each in the original as model saved_model.pb and as optimised
and non-optimised .tflite model.
For all three format variants of both trained models, the categories of the highest prob-
ability for all five test images of the 10 or 11 trained categories and the corresponding
probability are recorded in table 19.3. The correct results obtained are marked in
green.

Table 19.3.: Test results for the different models: A - model with modified dataset,
AlexNet, 100 epochs, batch size 32 in .pb format; A1 - tflite model
without quantisation; A2 - tflite model with quantisation; B - model with
CIFAR-10 dataset, AlexNet, 100 epochs, batch size 32 in .pb format; B1 -
tflite model without quantisation; B2 - tflite model with quantisation
Results
image
A A1 A2 B B1 B2
airplane0 99.9% ship 99.9% ship 99.7% ship 99.7% plane 94.4% bird 51.0% bird
airplane1 94.3% plane 94.3% plane 94.0% plane 99.9% plane 100% plane 100% plane
airplane2 62.0% auto 62.0% auto 99.2% auto 85.2% auto 85.2% auto 94.8% frog
airplane3 99.9% auto 99.9% auto 99.9% auto 100% plane 100% plane 100% plane
airplane4 99.9% plane 100% plane 99.9% plane 100% plane 100% plane 100% plane
auto0 99.7% horse 99.7% horse 100% horse 84.7% horse 84.7% horse 91.0% horse
auto1 94.6% auto 94.6% auto 97.8% auto 99.9% auto 100% auto 100% auto
auto2 42.0% bird 42.0% bird 95.9% truck 100% auto 100% auto 100% auto
Chapter 19. 207

auto3 100% auto 100% auto 100% auto 100% auto 100% auto 100% auto
auto4 100% horse 100% horse 100% horse 89.0% auto 89.0% auto 79.3% auto
bird0 100% bird 100% bird 100% bird 99.9% bird 99.9% bird 100% bird
bird1 95.3% horse 95.3% horse 85.2% horse 47.9% plane 47.9% plane 93.8% bird
bird2 99.7% frog 99.7% frog 99.7% frog 99.9% bird 100% bird 100% bird
bird3 80.4% truck 80.4% truck 41.5% truck 91.7% bird 91.7% bird 99.3% bird
bird4 99.9% bird 99.9% bird 99.4% bird 99.8% plane 99.8% plane 99.5% plane
cat0 97.9% horse 97.9% horse 98.6% horse 91.3% horse 91.3% horse 74.3% plane
cat1 74.2% horse 74.2% horse 66.3% horse 57.6% horse 57.6% horse 72.5% frog
cat2 99.9% horse 99.9% horse 99.8% horse 98.2% cat 98.2% cat 57.6% auto
cat3 99.6% bird 99.6% bird 99.8% bird 78.7% dog 78.7% dog 65.3% plane
cat4 99.0% horse 99.0% horse 59.6% bird 100% horse 100% horse 100% horse
deer0 86.6% cat 86.6% cat 80.2% cat 99.9% frog 100% frog 100% frog
deer1 99.8% horse 99.8% horse 99.8% horse 100% horse 100% horse 100% horse
deer2 99.9% frog 99.9% frog 99.9% frog 100% frog 100% frog 100% frog
deer3 96.6% bird 96.6% bird 96.4% bird 100% deer 100% deer 100% deer
deer4 97.3% truck 97.3% truck 97.8% truck 100% deer 100% deer 100% deer
dog0 90.0% truck 90.0% truck 97.8% truck 99.8% horse 99.8% horse 72.3% horse
dog1 98.7% dog 98.7% dog 89.0% dog 98.7% deer 100% dog 100% dog
dog2 99.3% horse 99.3% horse 99.0% horse 78.1% plane 78.1% plane 74.3% bird
dog3 99.9% bird 100% bird 100% bird 99.9% bird 99.9% bird 99.6% bird
dog4 99.9% horse 100% horse 99.99% horse 85.8% horse 85.8% horse 100% horse
frog0 95.0% truck 95.0% truck 87.19% truck 97.7% plane 97.7% plane 93.1% plane
frog1 46.5% bird 46.5% bird 44.1% frog 58.5% plane 58.5% plane 41.7% frog
frog2 99.9% horse 99.9% horse 99.9% horse 95.7% bird 100% frog 100% frog
frog3 99.3% bottle 99.3% bottle 96.3% bottle 99.5% frog 99.5% frog 99.9% frog
frog4 62.8% bird 62.8% bird 89.0% bird 58.1% plane 58.1% plane 62.3% plane
horse0 100% bird 100% bird 100% bird 99.9% horse 100% horse 100% horse
horse1 66.7% bird 66.7% bird 83.16% horse 77.5% frog 77.5% frog 99.6% frog
horse2 88.1% horse 88.1% horse 95.0% horse 99.9% horse 100% horse 100% horse
horse3 90.0% horse 90.0% horse 98.5% horse 100% horse 100% horse 100% horse
horse4 86.2% horse 86.2% horse 78.2% horse 77.7% dog 77.7% dog 50.5% bird
ship0 99.9% auto 100% auto 100% auto 85.8% truck 85.8% truck 50.9% frog
ship1 69.2% ship 69.2% ship 50.7% auto 99.0% plane 99.0% plane 99.7% plane
ship2 66.4% bird 66.4% bird 76.7% bird 50.4% frog 50.5% frog 83.3% plane
ship3 54.2% bottle 54.2% bottle 87.1% bottle 98.6% plane 98.6% plane 73.1% plane
ship4 73.8% ship 73.8% ship 97.4% truck 99.9% plane 100% plane 100% plane
truck0 52.6% horse 52.57% horse 99.9% horse 99.9% plane 100% plane 99.9% plane
truck1 99.9% truck 100% truck 100% truck 100% truck 100% truck 100% truck
truck2 99.9% truck 99.9% truck 99.6% truck 99.9% plane 99.9% plane 99.3% plane
truck3 64.5% auto 64.5% truck 99.3% auto 99.9% plane 99.9% plane 99.8% plane
truck4 99.9% bird 99.9% bird 99.9% bird 99.9% horse 100% horse 100% horse
bottle0 99.9% bottle 100% bottle 100% bottle - - -
bottle1 99.9% bird 99.9% bottle 99.5% bird - - -
bottle2 96.7% horse 96.7% horse 89.7% horse - - -
bottle3 93.7% truck 93.7% truck 99.9% truck - - -
bottle4 81.5% bird 81.5% bird 96.6% bird - - -

In table 19.4 the required computing time for the classification of the images with the
respective network in .tflite format with or without optimisation is recorded, which
is printed with the script tflite-foto.py together with the classification on the
output image and output in the terminal. All images are entered together, so that
208

#! / u s r / b i n / env python
# c o d i n g : u t f −8

# S c r i p t f o t t e s t i n g t h e t r a i n e d models w i t h AlexNet A r c h i t e c t u r e on a chosen p i c


# Written by C. Joachim January 2021

# In [ ] :

import t e n s o r f l o w a s t f
import numpy a s np
import time
from PIL import Image

# In [ ] :

# c h o o s e model p a r a m e t e r s
Dataset = 2
Model = 2 # must not be changed
Epochs = 100
Batch = 32

# In [ ] :

# l o a d t r a i n e d model
model = t f . k e r a s . models . load_model ( " saved_model / "+" D a t a s e t ␣%s ␣ Model ␣%s ␣ Epochs ␣%s ␣

# In [ ] :

# c h o o s e image
f i l e _ i n = " b o t t l e 3 . jpg "

# In [ ] :

# p r e p a r e imgage − a d j u s t s i z e and n o r m a l i z e
img = t f . i o . r e a d _ f i l e ( " images / "+f i l e _ i n )
img = t f . image . decode_jpeg ( img , c h a n n e l s =3)
img = t f . r e s h a p e ( img , ( 1 , img . shape [ 0 ] , img . shape [ 1 ] , 3 ) )
img = t f . image . p e r _ i m a g e _ s t a n d a r d i z a t i o n ( img )
img = t f . image . r e s i z e ( img , ( 2 2 7 , 2 2 7 ) )

# In [ ] :

# e v a l u a t e image w i t h model
prob = model . p r e d i c t ( img )

# In [ ] :

# f i n d c l a s s w i t h maximum c o n f i d e n c e
c o n f = np . amax ( prob )
i n d e x = np . argmax ( prob )
Chapter 19. 209

#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h p h o t o s from d i s c
# The t f l i t e −Models were t r a i n e d w i t h t h e o r o g i n a l CIFAR−10 d a t a s e t
# and a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b o t t l e ’ added from CIFAR−100)
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# b a s e d on t h e imagenet−c o n s o l e from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
#
# To run e n t e r ( i n t h e example u s i n g t h e image b o t t l e _ 1 and t h e model1 )
# $ python3 . / t f l i t e −f o t o . py b o t t l e _ 1 . j p g m1_bottle_1 . j p g model1 . t f l i t e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model3 . t f l i t e : o r i g i n a l CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h s w
# model4 . t f l i t e : o r i g i n a l CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h s w

import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s

import a r g p a r s e
import s y s

import t e n s o r f l o w a s t f
from PIL import Image
import numpy a s np
import time

# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣an␣ image ␣ u s i n g ␣ t f l i t e −mod
f o r m a t t e r _ c l a s s=a r g p a r s e . RawT

p a r s e r . add_argument ( " f i l e _ i n " , type=str , help=" f i l e n a m e ␣ o f ␣ t h e ␣ i n p u t ␣ image ␣ t o ␣ p r o


p a r s e r . add_argument ( " f i l e _ o u t " , type=str , d e f a u l t=None , n a r g s= ’ ? ’ , help=" f i l e n a m e
p a r s e r . add_argument ( " model " , type=str , d e f a u l t=" model1 . t f l i t e " , help=" model ␣ t o ␣ lo

try :
opt = p a r s e r . parse_known_args ( ) [ 0 ]
except :
print ( " " )
parser . print_help ()
sys . exit (0)

# Load t h e TFLite model and a l l o c a t e t e n s o r s .


i n t e r p r e t e r = t f . l i t e . I n t e r p r e t e r ( " t f l i t e m o d e l s / "+opt . model )
interpreter . allocate_tensors ()

# Get i n p u t and o u t p u t t e n s o r s
input_details = i n t e r p r e t e r . get_input_details ()
output_details = i n t e r p r e t e r . get_output_details ()

# Adjust input
h e i g h t = i n p u t _ d e t a i l s [ 0 ] [ ’ shape ’ ] [ 1 ]
width = i n p u t _ d e t a i l s [ 0 ] [ ’ shape ’ ] [ 2 ]
img = Image . open ( " images / "+opt . f i l e _ i n ) . r e s i z e ( ( width , h e i g h t ) )
input_data = np . expand_dims ( img , a x i s =0)
input_data = ( np . f l o a t 3 2 ( input_data ) − np . mean ( input_data ) ) / np . s t d ( input_data )
210

an increased computing time due to the first call of the model only occurs with the
image “airplane0”.

Table 19.4.: Computing time for the different models: A1 - Model with modified
dataset, AlexNet, 100 epochs, batch size in tflite model without quanti-
sation; A2 - Model with modified dataset, AlexNet, 100 epochs, batch
size in tflite model with quantisation; B1 - model with CIFAR-10 dataset,
AlexNet, 100 epochs, batch size 32 in tflite model without quantisation;
B2 - model with CIFAR-10 dataset, AlexNet, 100 epochs, batch size 32
in tflite model with quantisation.
Computing Time
image
A1 A2 B1 B2
airplane0 2.63 0.67 2.73 0.66
airplane1 0.31 0.13 0.35 0.13
airplane2 0.32 0.13 0.32 0.13
airplane3 0.32 0.13 0.31 0.13
airplane4 0.31 0.13 0.32 0.13
auto0 0.32 0.13 0.32 0.13
auto1 0.32 0.13 0.32 0.13
auto2 0.32 0.13 0.31 0.13
auto3 0.31 0.13 0.32 0.13
auto4 0.31 0.13 0.32 0.13
bird0 0.31 0.13 0.32 0.13
bird1 0.31 0.13 0.31 0.13
bird2 0.31 0.13 0.31 0.13
bird3 0.31 0.13 0.31 0.13
bird4 0.31 0.13 0.31 0.13
cat0 0.32 0.13 0.35 0.13
cat1 0.32 0.13 0.32 0.13
cat2 0.31 0.13 0.31 0.13
cat3 0.31 0.13 0.32 0.13
cat4 0.31 0.13 0.32 0.13
deer0 0.31 0.13 0.32 0.13
deer1 0.32 0.13 0.32 0.13
deer2 0.32 0.13 0.32 0.13
deer3 0.32 0.13 0.32 0.13
deer4 0.31 0.13 0.32 0.13
dog0 0.31 0.13 0.32 0.13
dog1 0.32 0.13 0.32 0.13
dog2 0.32 0.13 0.32 0.13
dog3 0.32 0.13 0.32 0.13
dog4 0.32 0.13 0.32 0.13
frog0 0.31 0.13 0.32 0.13
frog1 0.32 0.13 0.31 0.13
frog2 0.32 0.13 0.32 0.13
frog3 0.31 0.13 0.31 0.13
frog4 0.31 0.13 0.32 0.13
horse0 0.31 0.13 0.32 0.13
horse1 0.32 0.13 0.32 0.13
horse2 0.31 0.13 0.32 0.13
horse3 0.32 0.13 0.32 0.13
horse4 0.31 0.13 0.32 0.13
Chapter 19. 211

ship0 0.31 0.13 0.32 0.13


ship1 0.32 0.13 0.32 0.13
ship2 0.32 0.13 0.32 0.13
ship3 0.31 0.13 0.32 0.13
ship4 0.31 0.13 0.32 0.13
truck0 0.31 0.13 0.32 0.13
truck1 0.31 0.13 0.31 0.12
truck2 0.31 0.13 0.31 0.13
truck3 0.32 0.13 0.31 0.13
truck4 0.31 0.13 0.31 0.13
bottle0 0.32 0.13 - -
bottle1 0.31 0.13 - -
bottle2 0.32 0.13 - -
bottle3 0.32 0.13 - -
bottle4 0.32 0.13 - -

19.7.4. Evaluation of the Test Results


In general, the performance of the tested models is very mixed with regard to the
correct classification of the test images. While some of the images are classified
correctly, there are also many misclassifications, which are nevertheless assigned a
high probability for the respective category by the model itself. Overall, all models
with an error rate of more than 50 % (with respect to the determined category with
the highest probability) are far from a usable model quality for an application.
As expected due to the different sizes of the training datasets, the model trained
with the entire dataset CIFAR-10 with 60,000 images, except for the validation and
training images taken, can achieve more correct classifications in the test than the
model trained and tested with a total of 5,500 images. images than the model trained
and tested with a total of 5,500 images. Depending on the format used, the latter
achieved a proportion of around 30 % correct classifications in the tests, while the
model trained with more images was able to correctly classify around 40 % of the
images in the test.
There is no systematic difference between the results of the .pb and the .tflite models
in the tests. In most cases, the same models in different formats arrive at the same
results for the category with the highest probability; both are sometimes correct or
incorrect on their own.
Similarly, there are differences between the non-optimised and optimised .tflite models.
There are deviations between the two, which, however, cannot be attributed to a lower
hit rate of the optimised models on the basis of the tests carried out, since in part
this model can achieve a correct classification, while the non-optimised model comes
to a wrong result and vice versa.
The optimisation of the .tflite models, on the other hand, was successful in terms of
file size, with a reduction to a quarter, from 225 MB to 56.4 MB, and also in terms of
the computing time required. The non-optimised models usually need 31-32 ms to
classify one of the test images. The optimised models, on the other hand, can reduce
the computing time required for classification to 13 ms. Exceptions to these times
occur for the first image classified after programme start in each case, but even here
the computing time is reduced to about a third by the optimisation.

19.7.5. Conclusion and Potential for Improvement


The models created are not capable of reliable classification. This seems unexpected at
first, since the well-known architecture AlexNet was used. However, the publications
212

of Alex Krizhevsky [KSH12a] show that even the original AlexNet, when trained with
1.2 million high-resolution images of the ImageNet dataset, still has a top-1 error rate
of 37.5 %. In this respect, an error rate in the range of about twice this error rate
for a training with only 5,500 and still 60,000 images in a resolution of only 32 ×
32 is quite understandable. Also, other sources confirm the trajectories of accuracy
observed in the training on the training and validation data, which confirms that the
training was conducted correctly and thus the results.[Var20]
Due to these high error rates for models using the AlexNetAlexNet architecture, many
WS:sources more complex architectures have since been developed. If a reliable classification
should be realised in an application, it would make sense to use one of these more
advanced architectures, but this would also increase the training time accordingly.
This also applies to the use of many more training images. This would lead to a
stronger generalisation of the learned patterns, so that, for example, the risk that the
model classifies on the basis of the backgrounds instead of the motives can be reduced.
However, if the training time is not to be increased by such measures, it makes sense
to use an already pre-trained network and to train it further on the primary category
to be recognised for the application.
In this respect, only the first pass through the KDD scheme was realised here. Now,
in order to realise the actual application, it is necessary to proceed iteratively in order
to improve the mentioned aspects so that finally a model can be generated that can
be used in reality.
Regardless of the model quality, it can be stated that the models could be converted
into the .tflite format and evaluated on the Jetson Nano without major impairments.
The conversion did not necessarily increase the error rate. Even between the optimised
and non-optimised model, contrary to expectations, no increase in the error rate is
recognisable in the tests. On the other hand, the optimisation is effective in reducing
the file size and optimising the computing time.

19.7.6. Bottle Detection


Although the trained models are not suitable for reliable object or bottle detection, it
was nevertheless shown that with the help of Tensor Flow Lite it is possible to use
trained networks in the field on systems such as the Jetson Nano for real-time object
detection without much loss of model quality.
Possible programmes for bottle detection using the Jetson Nano, Pi camera and a
tflite model can be found below. The programs are composed of the already shown use
of tflite models in Python, see section ??, and the program developed in a previous
project to detect bottles with Pi camera and Jetson Nano.
The programs both output a signal via a user-selected GPIO pin when the bottle
detection sequence is ready or active, and another user-defined pin outputs a signal
when a bottle is detected in front of the camera.
In the process, the tflite-camera program, see Listing 19.4, continuously performs
object detection based on the images captured by the camera and always evaluates
50 frames together. This result is then shown on the display and a signal is output
via GPIO pin. The programme tflite-camera-button.py, see Listing 19.5, on the
other hand, continuously displays the camera image, but only evaluates a single image
taken when the ’q’ key on the keyboard is pressed. Here, too, the result is displayed
on the screen and by signaling the pin. In addition, the captured image is stored in a
folder together with the classification.
Chapter 19. 213

#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h c o n n e c t e d camera i n ’ l i v e ’−mode ,
# 50 frames a r e c l a s s i f i e d and t h e c a t e g o r y w i t h t h e h i g h e s t p o s s i b i l i t y i s d i s p l
# I f a b o t t l e i s d e t e c t e d , t h e chosen Pin i s s e t t o h i g h
# Another Pin i n d i c a t e s w h e t h e r program i s r e a d y t o e x e c u t e d e t e c t i o n
# Pins t o be used as o u t p u t can be d e f i n e d b e l o w i m p o r t i n g commands
#
# The t f l i t e −Models were t r a i n e d w i t h a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# Based on t h e m o d i f i e d imagenet−camera f o r b o t t l e −d e t e c t i o n from p r e v i o u s p r o j e c
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
# o r i g i n a l imagenet−camera from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
#
# Can be run w i t h
# $ python3 . / t f l i t e −camera . py model1 . t f l i t e
# where t h e l a t t e r d e f i n e s t h e model t o use .
# For f u r t h e r o p t i o n s s e e p a r s i n g o f command l i n e
#
#

import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s

import a r g p a r s e
import s y s

import a t e x i t
import J e t s o n . GPIO a s GPIO

import numpy a s np
import time

B o t t l e D e t = 12
Running = 13

#p r e p a r e t h e Pins as Outputs
GPIO . setmode (GPIO .BOARD)
GPIO . s e t u p ( B ot tl eD et , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r b o t t l e d e t e c t e d
GPIO . s e t u p ( Running , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r image d e t e c t i o n runni

# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣ a ␣ l i v e ␣ camera ␣ stream ␣ u s i n g
f o r m a t t e r _ c l a s s=a r g p a r s e . RawTextHelpFormatter )
p a r s e r . add_argument ( " model " , type=str , d e f a u l t=" model1 . t f l i t e " , help=" model ␣ t o ␣ lo
p a r s e r . add_argument ( "−−camera " , type=str , d e f a u l t=" 0 " , help=" i n d e x ␣ o f ␣ t h e ␣MIPI␣CS
p a r s e r . add_argument ( "−−width " , type=int , d e f a u l t =1280 , help=" d e s i r e d ␣ width ␣ o f ␣ cam
p a r s e r . add_argument ( "−−h e i g h t " , type=int , d e f a u l t =720 , help=" d e s i r e d ␣ h e i g h t ␣ o f ␣ ca

try :
opt = p a r s e r . parse_known_args ( ) [ 0 ]
except :
print ( " " )
214

#! / u s r / b i n / python
#
# S c r i p t f o r u s i n g t f l i t e −Models w i t h c o n n e c t e d camera i n ’ l i v e ’−mode
# Only c l a s s i f y t h e image whenever key ’ q ’ i s p r e s s e d ( t h e n t a k e a p h o t o and a n a l
# I f a b o t t l e i s d e t e c t e d , t h e chosen Pin i s s e t t o h i g h
# Another Pin i n d i c a t e s w h e t h e r program i s r e a d y t o e x e c u t e d e t e c t i o n
# Pins t o be used as o u t p u t can be d e f i n e d b e l o w i m p o r t i n g commands
#
# The t f l i t e −Models were t r a i n e d w i t h a m a n i p u l a t e d CIFAR−10 d a t a s e t ( c a t e g o r y ’ b
#
# Documentation i n t h e Document ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano ’ by
# R e l a t e d t o t h e P r o j e c t ’ K u e n s t l i c h e I n t e l l i g e n z mit dem J e t s o n Nano und TensorF
#
# Written by C. Joachim i n January 2021
# Based on t h e m o d i f i e d imagenet−camera f o r b o t t l e −d e t e c t i o n from p r e v i o u s p r o j e c
# and TensorFlow L i t e i n f e r e n c e w i t h Python from h t t p s : / /www. t e n s o r f l o w . o r g / l i t e /
# and h t t p s : / / g i t h u b . com/ t e n s o r f l o w / t e n s o r f l o w / b l o b / master / t e n s o r f l o w / l i t e / e x a m p
# o r i g i n a l imagenet−camera from h t t p s : / / g i t h u b . com/ d u s t y −nv / j e t s o n −i n f e r e n c e
#
# The Models :
# model1 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
# model2 . t f l i t e : m a n i p u l a t e d CIFAR−10 , AlexNet−a r c h i t e c t u r e , t r a i n e d i n 100 e p o c h
#
# Need t o i n s t a l l Keyboard Module w i t h
# sudo p i p 3 i n s t a l l k e y b o a r d
#
# Can be run w i t h
# $ sudo python3 . / t f l i t e −camera−b u t t o n . py model1 . t f l i t e
# where t h e l a t t e r d e f i n e s t h e model t o use .
# For f u r t h e r o p t i o n s s e e p a r s i n g o f command l i n e
#
# C r e a t e a f o l d e r named ’ out_images_camera ’ on same l e v e l b e f o r e h a n d s
#
#

import j e t s o n . i n f e r e n c e
import j e t s o n . u t i l s

import a r g p a r s e
import s y s

import a t e x i t
import J e t s o n . GPIO a s GPIO

import numpy a s np
import time
import keyboard

B o t t l e D e t = 12
Running = 13

#p r e p a r e t h e Pins as Outputs
GPIO . setmode (GPIO .BOARD)
GPIO . s e t u p ( B ot tl eD et , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r b o t t l e d e t e c t e d
GPIO . s e t u p ( Running , GPIO .OUT, i n i t i a l =GPIO .LOW) #S i g n a l f o r image d e t e c t i o n runni

# p a r s e t h e command l i n e
p a r s e r = a r g p a r s e . ArgumentParser ( d e s c r i p t i o n=" C l a s s i f y ␣ a ␣ l i v e ␣ camera ␣ stream ␣ u s i n g
f o r m a t t e r _ c l a s s=a r g p a r s e . RawTextHelpFormatter )
p a r s e r . add_argument ( " model " , type=str , d e f a u l t=" model1 . t f l i t e " , help=" model ␣ t o ␣ lo
p a r s e r . add_argument ( "−−camera " , type=str , d e f a u l t=" 0 " , help=" i n d e x ␣ o f ␣ t h e ␣MIPI␣CS
p a r s e r . add_argument ( "−−width " , type=int , d e f a u l t =1280 , help=" d e s i r e d ␣ width ␣ o f ␣ cam
Part IV.

Appendix

215
20. Material List
Number Designation Link Price

1 NVIDIA Jetson Nano Development Eckstein-shop.de 147,13 e


Kit-B01 - SS10417

1 NVIDIA Jetson Nano Development Eckstein-shop.de 1,70 e


Kit-B01 - WS16990

1 WaveShare AC8265 Wireless NIC for Jet- Eckstein-shop.de 17,65 e


son Nano WiFi / Bluetooth - WS16578

1 WaveShare Metal Case (C) Camera Eckstein-shop.de 12,99 e


Holder Internal Fan Design for Jetson - WS17855
Nano (B01)

1 Waveshare 10.1 inch Raspberry Pi Dis- Eckstein-shop.de 109,99 e


play 1280x800 capacitive Touchscreen - WS40020
HDMI LCD with case IPS

217
218

Number Designation Link Price

1 Raspberry Pi Camera Modul V2 8 Eckstein-shop.de 39,95 e


Megapixel features fixed focus lens - RP03023
on-board

1 5,1V 2,5A Universal Netzteil micro USB Eckstein-shop.de 9,79 e


Raspberry Pi 3 Ladegerät Power Adapter - CP07048
Schwarz

1 Ethernet Kabel CAT.5e ca.2m Eckstein-shop.de 1,08 e


- ZB59003

1 64GB Micro SD Card HC Class 10 Spe- Eckstein-shop.de 8,21 e


icherkarte mit Adapter - RP02043

1 RPI KEYBRD DE RW Entwickler- reichelt.de - RPI 16,70 e


boards - Tastatur, DE, rot/weiß KEYBRD DE
RW

1 RPI MOUSE RD Entwicklerboards - reichelt.de - RPI 7,90 e


Maus, rot/weiß MOUSE RD
Stand: 25.06.2020
Table 20.3.: List of materials for the image recognition application
21. Data Sheets

21.1. Data Overview

NVIDIA JETSON
NANO MODULE
SMALL. POWERFUL. POWERED BY AI.

Bringing AI to millions of new systems at the edge


The NVIDIA® Jetson Nano™ module is opening amazing new possibilities for edge
computing. It delivers up to 472 GFLOPS of accelerated computing, can run many
modern neural networks in parallel, and delivers the performance to process data from
multiple high-resolution sensors—a requirement for full AI systems. It’s also production-
ready and supports all popular AI frameworks. This makes Jetson Nano the ideal
platform for developing mass market AI products such as AIoT gateways, smart network
video recorders and cameras, consumer robots, and optical inspection systems.
The system-on-module is powered by the NVIDIA Maxwell™ GPU with 4 GB of memory,
which allows real-time processing of high-resolution inputs. It offers a unique
combination of performance, power advantage, and a rich set of IOs, from high-speed
CSI and PCIe to low-speed I2Cs and GPIOs. Plus, it supports multiple diverse sets of
sensors to enable a variety of applications with incredible power efficiency, consuming as
little as 5 W.
Jetson Nano is supported by NVIDIA JetPack™, which includes a board support package
(BSP), Linux OS, NVIDIA CUDA®, cuDNN, and TensorRT™ software libraries for deep
learning, computer vision, GPU computing, multimedia processing, and much more.
The comprehensive software stack makes AI deployment on autonomous machines fast,
reduces complexity, and speeds time to market.
The same JetPack SDK is used across the entire NVIDIA Jetson™ family of products and
is fully compatible with NVIDIA’s world-leading AI platform for training and deploying
AI software.

KEY FEATURES
Jetson Nano module Environment
>> 128-core NVIDIA Maxwell GPU >> Operating Temperature: -25 C to 80 C*
>> Quad-core ARM® A57 CPU >> Storage Temperature: -25 C to 80 C
>> 4 GB 64-bit LPDDR4 >> Humidity: 85% RH, 85ºC
>> 16 GB eMMC 5.1 [non-operational]
>> 10/100/1000BASE-T Ethernet >> Vibration: Sinusoidal 5 G RMS 10 to 500
Hz, random 2.88 G RMS, 5 to 500 Hz
Power [non-operational]
>> Voltage Input: 5 V >> Shock: 140 G, half sine 2 ms duration
>> Module Power: 5 W~10 W [non-operational]

* See thermal design guide for details NVIDIA JETSON NANO MODULE | Data sheet  | Jul19

Downloaded from Arrow.com.

219
220

NVIDIA JETSON NANO MODULE


TECHNICAL SPECIFICATIONS

GPU 128-core Maxwell


CPU Quad-core ARM A57 @ 1.43 GHz
Memory 4 GB 64-bit LPDDR4 25.6 GB/s
Storage 16 GB eMMC 5.1
Video Encode 4K @ 30 | 4x 1080p @ 30 | 9x 720p @ 30 (H.264/H.265)
Video Decode 4K @ 60 | 2x 4K @ 30 | 8x 1080p @ 30 | 18x 720p @ 30
(H.264/H.265)
CSI 12 (3x4 or 4x2) lanes MIPI CSI-2 D-PHY 1.1
Connectivity Gigabit Ethernet
Display HDMI 2.0, eDP 1.4, DP 1.2 (two simultaneously)
PCIe 1x1/2/4 PCIe Gen2
USB 1x USB 3.0, 3x USB 2.0
Others I2C, I2S, SPI, UART, SD/SDIO, GPIO
Mechanical 69.6 mm x 45 mm 260-pin SODIMM Connector

Learn more at https://developer.nvidia.com/jetson

© 2019 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, CUDA, Jetson, Jetson Nano, NVIDIA Maxwell, and TensorRT are
trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be
trademarks of the respective companies with which they are associated. ARM, AMBA and ARM Powered are registered trademarks of ARM
Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands or product names are the property of their respective holders.
“ARM” is used to represent ARM Holdings plc; its operating company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea
Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt.
Ltd.; ARM Norway, AS and ARM Sweden AB. Jul19

Downloaded from Arrow.com.


Chapter 21. Data Sheets 221

21.2. Data Sheet

DATA SHEET

NVIDIA Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Maxwell GPU ◊ Display Controller


128-core GPU | End-to-end lossless compression | Tile Caching | Two independent display controllers support DSI, HDMI, DP, eDP:
OpenGL ® 4.6 | OpenGL ES 3.2 | Vulkan™ 1.1 | CUDA ® | OpenGL MIPI-DSI (1.5Gbps/lane): Single x2 lane | Maximum Resolution:
ES Shader Performance (up to): 512 GFLOPS (FP16) 1920x960 at 60Hz (up to 24bpp)
Maximum Operating Frequency: 921MHz
HDMI 2.0a/b (up to 6Gbps) | DP 1.2a (HBR2 5.4 Gbps) | eDP
1.4 (HBR2 5.4Gbps) | Maximum Resolution (DP/eDP/HDMI):
CPU 3840 x 2160 at 60Hz (up to 24bpp)
ARM® Cortex® -A57 MPCore (Quad-Core) Processor with NEON
Technology | L1 Cache: 48KB L1 instruction cache (I-cache) per Clocks
core; 32KB L1 data cache (D-cache) per core | L2 Unified Cache:
System clock: 38.4MHz | Sleep clock: 32.768kHz | Dynamic clock
2MB | Maximum Operating Frequency: 1.43GHz
scaling and clock source selection

Audio
Multi-Stream HD Video and JPEG
Industry standard High Definition Audio (HDA) controller provides a
Video Decode
multichannel audio path to the HDMI interface.
H.265 (Main, Main 10): 2160p 60fps | 1080p 240fps
Memory H.264 (BP/MP/HP/Stereo SEI half-res): 2160p 60fps | 1080p
240fps
Dual Channel | System MMU | Memory Type: 4ch x 16-bit LPDDR4
| Maximum Memory Bus Frequency: 1600MHz | Peak Bandwidth: H.264 (MVC Stereo per view): 2160p 30fps | 1080p 120fps
25.6 GB/s | Memory Capacity: 4GB VP9 (Profile 0, 8-bit): 2160p 60fps | 1080p 240fps
VP8: 2160p 60fps | 1080p 240fps
Storage VC-1 (Simple, Main, Advanced): 1080p 120fps | 1080i 240fps
eMMC 5.1 Flash Storage | Bus Width: 8-bit | Maximum Bus MPEG-2 (Main): 2160p 60fps | 1080p 240fps | 1080i 240fps
Frequency: 200MHz (HS400) | Storage Capacity: 16GB Video Encode
H.265:2160p 30fps | 1080p 120fps
Boot Sources H.264 (BP/MP/HP): 2160p 30fps | 1080p 120fps
eMMC and USB (recovery mode) H.264 (MVC Stereo per view): 1440p 30fps | 1080p 60fps
VP8: 2160p 30fps | 1080p 120fps
Networking JPEG (Decode and Encode): 600 MP/s
10/100/1000 BASE-T Ethernet | Media Access Controller (MAC)
Peripheral Interfaces
Imaging xHCI host controller with integrated PHY: 1 x USB 3.0, 3 x USB 2.0
Dedicated RAW to YUV processing engines process up to | USB 3.0 device controller with integrated PHY | EHCI controller
1400Mpix/s (up to 24MP sensor) | MIPI CSI 2.0 up to 1.5Gbps (per with embedded hub for USB 2.0 | 4-lane PCIe: one x1/2/4 controller
lane) | Support for x4 and x2 configurations (up to four active | single SD/MMC controller (supporting SDIO 4.0, SD HOST 4.0) |
streams). 3 x UART | 2 x SPI | 4 x I2C | 2 x I2S: support I2S, RJM, LJM, PCM,
TDM (multi-slot mode) | GPIOs
Operating Requirements
Temperature Range (Tj): -25 – 97C* | Module Power: 5 – 10W | Mechanical
Power Input: 5.0V Module Size: 69.6 mm x 45 mm | PCB: 8L HDI | Connector: 260 pin
SO-DIMM

Note: Refer to the software release feature list for current software support; all features may not be available for a particular OS.

Product is based on a published Khronos Specification and is expected to pass the Khronos Conformance Process. Current conformance status
can be found at www.khronos.org/conformance.

* See the Jetson Nano Thermal Design Guide for details. Listed temperature range is based on module Tj characterization.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 1
222

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Revision History
Version Date Description
v0.1 JAN 2019 Initial Release
v0.7 MAY 2019 Description
• Memory: corrected peak bandwidth
• Peripheral Interfaces: corrected number of available I2C interfaces
Functional Overview
• Removed block diagram; see the Jetson Nano Product Design Guide for these details
Power and System Management
• Removed On-Module Internal Power Rails table
• Updated Power Domains table
• Updated Programmable Interface Wake Event table
• Updated Power Up/Down sequence diagrams
Pin Descriptions
• Updated throughout to reflect updated pinmux
• GPIO Pins: updated table to reflect dedicated GPIO pins only (see pinmux for ALL GPIO
capable pins)
Interface Descriptions
• Updated throughout to reflect updated pinmux
• Embedded DisplayPort (eDP) Interface: clarified DP use/limitations on DP0
• MIPI Camera Serial Interface (CSI) - Updated CSI description to remove erroneous
reference to virtual channels
Physical/Electrical Characteristics
• Absolute Maximum Ratings - Added reference to Jetson Nano Thermal Design Guide for
Operating Temperature; extended IDDMAX to 5A
• Pinout: Updated to reflect updated pinmux
• Package Drawing and Dimensions – Updated drawing
v0.8 OCT 2019 Description
• Operating Requirements: corrected Module Power to reflect power for module only
(previous stated range included module + IO); updated Temperature Range for clarity,
included maximum operating temperature and updated note to reflect module
temperature is based on Tj .
v1.0 FEB 2020 Pin Descriptions
• GPIO Pins: corrected pin number listing for GPIO01
Interface Descriptions
• High-Definition Multimedia Interface (HDMI) and DisplayPort (DP) Interfaces reference to
YUV output support
• Gigabit Ethernet – Corrected Realtek Gigabit Ethernet Controller part number
Physical/Electrical Characteristics
• Operating and Absolute Maximum Ratings – Added Mounting Force to Absolute
Maximum Ratings table.
• Package Drawing and Dimensions – Updated drawing
• Environmental & Mechanical Screening – Added section

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 2
Chapter 21. Data Sheets 223

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table of Contents
1.0 Functional Overview 4
1.1 Maxwell GPU........................................................................................................................................................................ 4
1.2 CPU Complex....................................................................................................................................................................... 5
1.2.1 Snoop Control Unit and L2 Cache....................................................................................................................... 5
1.2.2 Performance Monitoring ....................................................................................................................................... 6
1.3 High-Definition Audio-Video Subsystem............................................................................................................................. 6
1.3.1 Multi-Standard Video Decoder............................................................................................................................. 6
1.3.2 Multi-Standard Video Encoder ............................................................................................................................. 6
1.3.3 JPEG Processing Block ....................................................................................................................................... 7
1.3.4 Video Image Compositor (VIC) ............................................................................................................................ 7
1.4 Image Signal Processor (ISP) ............................................................................................................................................. 8
1.5 Display Controller Complex ................................................................................................................................................. 8
1.6 Memory ................................................................................................................................................................................. 9
2.0 Power and System Management 10
2.1 Power Rails ........................................................................................................................................................................ 11
2.2 Power Domains/Islands ..................................................................................................................................................... 11
2.3 Power Management Controller (PMC).............................................................................................................................. 12
2.3.1 Resets ................................................................................................................................................................. 12
2.3.2 System Power States and Transitions .............................................................................................................. 12
2.3.2.1 ON State ................................................................................................................................................... 12
2.3.2.2 OFF State ................................................................................................................................................. 13
2.3.2.3 SLEEP State............................................................................................................................................. 13
2.4 Thermal and Power Monitoring ......................................................................................................................................... 13
2.5 Power Sequencing ............................................................................................................................................................. 14
2.5.1 Power Up ............................................................................................................................................................ 14
2.5.2 Power Down........................................................................................................................................................ 14
3.0 Pin Descriptions 15
3.1 MPIO Power-on Reset Behavior ....................................................................................................................................... 15
3.2 MPIO Deep Sleep Behavior .............................................................................................................................................. 16
3.3 GPIO Pins........................................................................................................................................................................... 17
4.0 Interface Descriptions 18
4.1 USB..................................................................................................................................................................................... 18
4.2 PCI Express (PCIe)............................................................................................................................................................ 19
4.3 Display Interfaces............................................................................................................................................................... 20
4.3.1 MIPI Display Serial Interface (DSI) .................................................................................................................... 20
4.3.2 High-Definition Multimedia Interface (HDMI) and DisplayPort (DP) Interfaces............................................... 21
4.3.3 Embedded DisplayPort (eDP) Interface ............................................................................................................ 22
4.4 MIPI Camera Serial Interface (CSI) / VI ( Video Input) ..................................................................................................... 23
4.5 SD / SDIO ........................................................................................................................................................................... 25
4.6 Inter-IC Sound (I2S)............................................................................................................................................................ 25
4.7 Miscellaneous Interfaces ................................................................................................................................................... 26
4.7.1 Inter-Chip Communication (I2C) ........................................................................................................................ 26
4.7.2 Serial Peripheral Interface (SPI) ........................................................................................................................ 27
4.7.3 UART ................................................................................................................................................................... 29
4.7.4 Gigabit Ethernet.................................................................................................................................................. 30
4.7.5 Fan ...................................................................................................................................................................... 30
4.7.6 Debug .................................................................................................................................................................. 31
5.0 Physical / Electrical Characteristics 32
5.1 Operating and Absolute Maximum Ratings ...................................................................................................................... 32
5.2 Digital Logic ........................................................................................................................................................................ 33
5.3 Pinout.................................................................................................................................................................................. 35
5.4 Package Drawing and Dimensions ................................................................................................................................... 36

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 3
224

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

1.0 Functional Overview


Designed for use in power-limited environments, the Jetson Nano squeezes industry-leading compute capabilities, 64-bit
operating capability, and integrated advanced multi-function audio, video and image processing pipelines into a 260-pin SO-
DIMM. The Maxwell GPU architecture implemented several architectural enhancements designed to extract maximum
performance per watt consumed. Core components of the Jetson Nano series module include:

 NVIDIA® Tegra® X1 series SoC


- NVIDIA Maxwell GPU
- ARM® quad-core Cortex ®-A57 CPU Complex
 4GB LPDDR4 memory
 16GB eMMC 5.1 storage
 Gigabit Ethernet (10/100/1000 Mbps)
 PMIC, regulators, power and voltage monitors
 260-pin keyed connector (exposes both high-speed and low-speed industry standard I/O)
 On-chip temperature sensors

1.1 Maxwell GPU


The Graphics Processing Cluster (GPC) is a dedicated hardware block for rasterization, shading, texturing, and compute;
most of the GPU’s core graphics functions are performed inside the GPC. Within the GPC there are multiple Streaming
Multiprocessor (SM) units and a Raster Engine. Each SM includes a Polymorph Engine and Texture Units; raster operations
remain aligned with L2 cache slices and memory controllers

The Maxwell GPU architecture introduced an all-new design for the SM, redesigned all unit and crossbar structures, optimized
data flows, and significantly improved power management. The SM scheduler architecture and algorithms were rewritten to
be more intelligent and avoid unnecessary stalls, while further reducing the energy per instruction required for scheduling. The
organization of the SM also changed; each Maxwell SM (called SMM) is now partitioned into four separate processing blocks,
each with its own instruction buffer, scheduler and 32 CUDA cores.

The SMM CUDA cores perform pixel/vertex/geometry shading and physics/compute calculations. Texture units perform
texture filtering and load/store units fetch and save data to memory. Special Function Units (SFUs) handle transcendental and
graphics interpolation instructions. Finally, the Polymorph Engine handles vertex fetch, tessellation, viewport transform,
attribute setup, and stream output. The SMM geometry and pixel processing performance make it highly suitable for rendering
advanced user interfaces and complex gaming applications; the power efficiency of the Maxwell GPU enables this
performance on devices with power-limited environments.

Features:

 End-to-end lossless compression


 Tile Caching
 Support for OpenGL 4.6, OpenGL ES 3.2, Vulkan 1.1, DirectX 12, CUDA 10 (FP16)
 Adaptive Scalable Texture Compression (ATSC) LDR profile supported
 Iterated blend, ROP OpenGL-ES blend modes
 2D BLIT from 3D class avoids channel switch
 2D color compression
 Constant color render SM bypass
 2x, 4x, 8x MSAA with color and Z compression
 Non-power-of-2 and 3D textures, FP16 texture filtering

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 4
Chapter 21. Data Sheets 225

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

 FP16 shader support


 Geometry and Vertex attribute Instancing
 Parallel pixel processing
 Early-z reject: Fast rejection of occluded pixels acts as multiplier on pixel shader and texture performance while saving
power and bandwidth
 Video protection region
 Power saving: Multiple levels of clock gating for linear scaling of power

GPU frequency and voltage are actively managed by Tegra Power and Thermal Management Software and influenced by
workload. Frequency may be throttled at higher temperatures (above a specified threshold) resulting in a behavior that
reduces the GPU operating frequency. Observed chip-to-chip variance is due to NVIDIA ability to maximize performance
(DVFS) on a per-chip basis, within the available power budget.

1.2 CPU Complex


The CPU complex is a high-performance Multi-Core SMP cluster of four ARM Cortex-A57 CPUs with 2MB of L2 cache (shared
by all cores). Features include:

 Superscalar, variable-length, out-of-order pipeline


 Dynamic branch prediction with Branch Target Buffer (BTB) and Global History Buffer RAMs, a return stack, and an
indirect predictor
 48-entry fully-associative L1 instruction TLB with native support for 4KB, 64KB, and 1MB page sizes.
 32-entry fully-associative L1 data TLB with native support for 4KB, 64KB, and 1MB pages sizes.
 4-way set-associative unified 1024-entry Level 2 (L2) TLB in each processor
 48Kbyte I-cache and 32Kbyte D-cache for each core.
 Full implementation of ARMv8 architecture instruction set
 Embedded Trace Microcell (ETM) based on the ETMv4 architecture
 Performance Monitor Unit (PMU) based on the PMUv3 architecture
 Cross Trigger Interface (CTI) for multiprocessor debugging
 Cryptographic Engine for crypto function support
 Interface to an external Generic Interrupt Controller (vGIC-400)
 Power management with multiple power domains

CPU frequency and voltage are actively managed by Tegra Power and Thermal Management Software and influenced by
workload. Frequency may be throttled at higher temperatures (above a specified threshold) resulting in a behavior that
reduces the CPU operating frequency. Observed chip-to-chip variance is due to NVIDIA ability to maximize performance
(DVFS) on a per-chip basis, within the available power budget.

1.2.1 Snoop Control Unit and L2 Cache


The CPU cluster includes an integrated snoop control unit (SCU) that maintains coherency between the CPUs within the
cluster and a tightly coupled L2 cache that is shared between the CPUs within the cluster. The L2 cache also provides a 128-
bit AXI master interface to access DRAM. L2 cache features include:

 2MB L2
 Fixed line length of 64 bytes
 16-way set-associative cache structure

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 5
226

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

 Duplicate copies of the L1 data cache directories for coherency support


 Hardware pre-fetch support
 ECC support

1.2.2 Performance Monitoring


The performance monitoring unit (part of MPCore non-CPU logic) provides six counters, each of which can count any of the
events in the processor. The unit gathers various statistics on the operation of the processor and memory system during
runtime, based on ARM PMUv3 architecture.

1.3 High-Definition Audio-Video Subsystem


The audio-video subsystem off-loads audio and video processing activities from the CPU subsystem resulting in faster, fully
concurrent, highly efficient operation.

1.3.1 Multi-Standard Video Decoder


The video decoder accelerates video decode, supporting low resolution content, Standard Definition (SD), High Definition (HD)
and UltraHD (2160p, or 4k video) profiles. The video decoder is designed to be extremely power efficient without sacrificing
performance.

The video decoder communicates with the memory controller through the video DMA which supports a variety of memory
format output options. For low power operations, the video decoder can operate at the lowest possible frequency while
maintaining real-time decoding using dynamic frequency scaling techniques.

Video standards supported:

 H.265: Main10, Main


 WEBM VP9 and VP8
 H.264: Baseline (no FMO/ASO support), Main, High, Stereo SEI (half-res)
 VC-1: Simple, Main, Advanced
 MPEG-4: Simple (with B frames, interlaced; no DP and RVLC)
 H.263: Profile 0
 DiVX: 4/5/6
 XviD Home Theater
 MPEG-2: MP

1.3.2 Multi-Standard Video Encoder


The multi-standard video encoder enables full hardware acceleration of various encoding standards. It performs high-quality
video encoding operations for applications such as video recording and video conferencing. The encode processor is designed
to be extremely power-efficient without sacrificing performance.

Video standards supported:

 H.265 Main Profile: I-frames and P-frames (No B-frames)


 H.264 Baseline/Main/High Profiles: IDR/I/P/B-frame support, MVC
 VP8
 MPEG4 (ME only)
 MPEG2 (ME only)
 VC1 (ME only): No B frame, no interlaced

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 6
Chapter 21. Data Sheets 227

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

1.3.3 JPEG Processing Block


The JPEG processing block is responsible for JPEG (de)compression calculations (based on JPEG still image standard),
image scaling, decoding (YUV420, YUV422H/V, YUV444, YUV400) and color space conversion (RGB to YUV; decode only).

Input (encode) formats:

 Pixel width: 8bpc


 Subsample format: YUV420
 Resolution up to 16K x 16K
 Pixel pack format
- Semi-planar/planar for 420

Output (decode) formats:

 Pixel width 8bpc


 Resolution up to 16K x 16K
 Pixel pack format
- Semi-planar/planar for YUV420
- YUY2/planar for 422H/422V
- Planar for YUV444
- Interleave for RGBA

1.3.4 Video Image Compositor (VIC)


The Video Image Compositor implements various 2D image and video operations in a power-efficient manner. It handles
various system UI scaling, blending and rotation operations, video post-processing functions needed during video playback,
and advanced de-noising functions used for camera capture.

Features:

 Color Decompression
 High-quality Deinterlacing
 Inverse Teleciné
 Temporal Noise Reduction
- High-quality video playback
- Reduces camera sensor noise
 Scaling
 Color Conversion
 Memory Format Conversion
 Blend/Composite
 2D Bit BLIT operation
 Rotation

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 7
228

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

1.4 Image Signal Processor (ISP)


The ISP module takes data from the VI/CSI module or memory in raw Bayer format and processes it to YUV output. The
imaging subsystem supports raw (Bayer) image sensors up to 24 million pixels. Advanced image processing is used to convert
input to YUV data and remove artifacts introduced by high megapixel CMOS sensors and optics with up to 30-degree CRA.

Features:

 Flexible post-processing architecture for supporting custom computer vision and computational imaging operations
 Bayer domain hardware noise reduction
 Per-channel black-level compensation
 High-order lens-shading compensation
 3 x 3 color transform
 Bad pixel correction
 Programmable coefficients for de-mosaic with color artifact reduction
Color Artifact Reduction: a two-level (horizontal and vertical) low-pass filtering scheme that is used to reduce/remove
any color artifacts that may result from Bayer signal processing and the effects of sampling an image.
 Enhanced down scaling quality
 Edge Enhancement
 Color and gamma correction
 Programmable transfer function curve
 Color-space conversion (RGB to YUV)
 Image statistics gathering (per-channel)
- Two 256-bin image histograms
- Up to 4,096 local region averages
- AC flicker detection (50Hz and 60Hz)
- Focus metric block

1.5 Display Controller Complex


The Display Controller Complex integrates two independent display controllers. Each display controller is capable of
interfacing to an external display device and can drive the same or different display contents at different resolutions and
refresh rates. Each controller supports a cursor and three windows (Window A, B, and C); controller A supports two additional
simple windows (Window D, T). The display controller reads rendered graphics or video frame buffers in memory, blends them
and sends them to the display.

Features:

 Two heads. Each can be mapped to one of:


- 1x DSI, 1x eDP/DP (Limited Functionality: No Audio)
- 1x HDMI/DP (Full Functionality)
 90, 180, 270-degree image transformation uses both horizontal and vertical flips (controller A only)
 Byte-swapping options on 16-bit and 32-bit boundary for all color depths
 NVIDIA Pixel Rendering Intensity and Saturation Management™ (PRISM)
 256 x 256 cursor size
 Color Management Unit for color decompression and to enhance color accuracy (compensate for the color error
specific to the display panel being used)

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 8
Chapter 21. Data Sheets 229

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

 Scaling and tiling in hardware for lower power operation


 Full color alpha-blending
 Captive panels
- Secure window (Win T) for TrustZone
- Supports cursor and up to four windows (Win A, B, C, and D)
- 1x 2-lane MIPI DSI
- Supports MIPI D-PHY rates up to 1.5Gbps
- 4-lane eDP with AUX channel
- Independent resolution and pixel clock
- Supports display rotation and scaling in hardware
 External displays
- Supports cursor and three windows (Window A, B, and C)
- 1x HDMI (2.0) or DisplayPort (HBR2) interface
- Supports display scaling in hardware

1.6 Memory
The Jetson Nano integrates 4GB of LPDDR4 over a four-channel x 16-bit interface. Memory frequency options are 204MHz
and 1600MHz; maximum frequency of 1600MHz has a theoretical peak memory bandwidth of 25.6GB/s.

The Memory Controller (MC) maximizes memory utilization while providing minimum latency access for critical CPU requests.
An arbiter is used to prioritize requests, optimizing memory access efficiency and utilization and minimizing system power
consumption. The MC provides access to main memory for all internal devices. It provides an abstract view of memory to its
clients via standardized interfaces, allowing the clients to ignore details of the memory hierarchy. It optimizes access to shared
memory resources, balancing latency and efficiency to provide best system performance, based on programmable
parameters.

Features:
 TrustZone (TZ) Secure and OS-protection regions
 System Memory Management Unit
 Dual CKE signals for dynamic power down per device
 Dynamic Entry/Exit from Self -Refresh and Power Down states

The MC can sustain high utilization over a very diverse mix of requests. For example, the MC is prioritized for bandwidth (BW)
over latency for all multimedia blocks (the multimedia blocks have been architected to prefetch and pipeline their operations to
increase latency tolerance); this enables the MC to optimize performance by coalescing, reordering, and grouping requests to
minimize memory power. DRAM also has modes for saving power when it is either not being used, or during periods of
specific types of use.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 9
230

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

2.0 Power and System Management


The Jetson Nano module operates from a single power source (VDD_IN) with all internal module voltages and I/O voltages
generated from this input. This enables the on-board power management controller to implement a tiered structure of power
and clock gating in a complex environment that optimizes power consumption based on workload:

 Power Management Controller (PMC) and Real Time Clock (RTC): These blocks reside in an Always On (not
power gated) partition. The PMC provides an interface to an external power manager IC or PMU. It primarily controls
voltage transitions for the SoC as it transitions to/from different low power modes; it also acts as a slave receiving
dedicated power/clock request signals as well as wake events from various sources (e.g., SPI, I2C, RTC, USB
attach) which can wake the system from a deep-sleep state. The RTC maintains the ability to wake the system based
on either a timer event or an external trigger (e.g., key press).
 Power Gating: The SoC aggressively employs power-gating (controlled by PMC) to power-off modules which are
idle. CPU cores are on a separate power rail to allow complete removal of power and eliminate leakage. Each CPU
can be power gated independently. Software provides context save/restore to/from DRAM.
 Clock Gating: Used to reduce dynamic power in a variety of power states.
 Dynamic Voltage and Frequency Scaling (DVFS): Raises voltages and clock frequencies when demand requires,
lowers them when less is sufficient, and removes them when none is needed. DVFS is used to change the voltage
and frequencies in the following power domains: CPU, CORE, and GPU.

An optional back up battery can be attached to the PMIC_BBAT module input. It is used to maintain the RTC voltage when
VDD_IN is not present. This pin is connected directly to the onboard PMIC. When a backup cell is connected to the PMIC, the
RTC will retain its contents and can be configured to charge the backup cell.

The following backup cells may be attached to the PMIC_BBAT pin:


 Super Capacitor (gold cap, double layer electrolytic)
 Standard capacitors (tantalum)
 Rechargeable Lithium Manganese cells

The backup cells MUST provide a voltage in the range 2.5V to 3.5V. These is charged with a constant current (CC), constant
voltage (CV) charger that can be configured between 2.5V and 3.5V CV output and 50µA to 800µA CC.

Table 1 Power and System Control Pin Descriptions

Pin Name Direction Type PoR Description

251 VDD_IN Input 5.0V Power: Main DC input, supplies PMIC and other
252 regulators
253
254
255
256
257
258
259
260

235 PMIC_BBAT Bidirectional 1.65V-5.5V Power: PMIC Battery Back-up. Optionally used to
provide back-up power for the Real-Time Clock
(RTC). Connects to lithium cell or super capacitor on
carrier board. PMIC is supply when charging cap or
coin cell. Super cap or coin cell is source when
system is disconnected from power.

240 SLEEP/WAKE* Input CMOS – 5.0V PU Sleep / Wake. Configured as GPIO for optional use
to place system in sleep mode or wake system from
sleep.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 10
Chapter 21. Data Sheets 231

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name Direction Type PoR Description


214 FORCE_RECOVERY* Input CMOS – 1.8V PU Force Recovery: strap pin

237 POWER_EN Input CMOS – 5.0V Module on/off: high = on, low = off.

233 SHUTDOWN_REQ* Output CMOS – 5.0V z Shutdown Request: used by the module to request a
shutdown from the carrier board (POWER_EN low).
100kΩ pull-up to VDD_IN (5V) on the module.

239 SYS_RESET* Bidirectional Open Drain, 1.8V 1 Module Reset. Reset to the module when driven low
by the carrier board. When module power sequence
is complete used as carrier board supply enable.
Used to ensure proper power on/off sequencing
between module and carrier board supplies. 4.7kΩ
pull-up to 1.8V on the module.

178 MOD_SLEEP* Output CMOS – 1.8V Indicates the module sleep status. Low is in sleep
mode, high is normal operation. This pin is
controlled by system software and should not be
modified.

2.1 Power Rails


VDD_IN must be supplied by the carrier board that the Jetson Nano is designed to connect to. It must meet the required
electrical specifications detailed in Section 5. All Jetson Nano interfaces are referenced to on-module voltage rails; no I/O
voltage is required to be supplied to the module. See the Jetson Nano Product Design Guide for details of connecting to each
of the interfaces.

2.2 Power Domains/Islands


Power domains and power islands are used to optimize power consumption for various low-power use cases and limiting
leakage current. The RTC domain is always on, CORE/CPU/GPU domains can be turned on and off. The CPU, CORE and
GPU power domains also contain power-gated islands which are used to power individual modules (as needed) within each
domain. Clock-gating is additionally applied during powered-on but idle periods to further reduce unnecessary power
consumption. Clock-gating can be applied to both power-gated and non-power-gated islands (NPG).

Table 2 Power Domains


Power Domain Power Island in Domain Modules in Power Island
RTC N/A PMC (Power Management Controller)
(VDD_RTC)
RTC (Real Time Clock)
CORE NPG (Non-Power-Gated) AHB, APB Bus, AVP, Memory Controller (MC/EMC), USB 2.0, SDMMC
(VDD_SOC)
VE, VE2 ISPs (image signal processing) A and B, VI (video input), CSI (Camera Serial
Interface)
NVENC Video Encode
NVDEC Video Decode
NVJPG JPG accelerator and additional Video Decode
PCX PCIe
SOR HDMI, DSI, DP
IRAM IRAM
DISP-A, DISP-B Display Controllers A and B
XUSBA, XUSBB, XUSBC USB 3.0
VIC VIC (Video Image Compositor)
ADSP APE (Audio Processing Engine)
DFD Debug logic

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 11
232

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Power Domain Power Island in Domain Modules in Power Island


GPU GPU 3D, FE, PD, PE, RAST, SM, ROP
(VDD_GPU)
CPU CPU 0 CPU 0
(VDD_CPU)
CPU 1 CPU 1
CPU 2 CPU 2
CPU 3 CPU 3
Non-CPU L2 Cache for Main CPU complex
TOP Top level logic

2.3 Power Management Controller (PMC)


The PMC power management features enable both high-speed operation and very low-power standby states. The PMC
primarily controls voltage transitions for the SoC as it transitions to/from different low-power modes; it also acts as a slave
receiving dedicated power/clock request signals as well as wake events from various sources (e.g., SPI, I2C, RTC, USB
attach) which can wake the system from deep sleep state. The PMC enables aggressive power-gating capabilities on idle
modules and integrates specific logic to maintain defined states and control power domains during sleep and deep sleep
modes.

2.3.1 Resets
The PMC receives the primary reset event (from SYS_RESET*) and generates various resets for: PMC, RTC, and CAR. From
the PMC provided reset, the Clock and Reset (CAR) controller generates resets for most of the blocks in the module. In
addition to reset events, the PMC receives other events (e.g., thermal, WatchDog Timer (WDT), software, wake) which also
result in variants of system reset.

The RTC block includes an embedded real-time clock and can wake the system based on either a timer event or an external
trigger (e.g., key press).

2.3.2 System Power States and Transitions


The Jetson module operates in three main power modes: OFF, ON, and SLEEP. The module transitions between these states
are based on various events from hardware or software. Figure 1 shows the transitions between these states.

Figure 1 Power State Diagram

ON SLEEP
EVENT EVENT

OFF ON SLEEP

OFF WAKE
EVENT EVENT

2.3.2.1 ON State
The ON power state is entered from either OFF or SLEEP states. In this state the Jetson module is fully functional and
operates normally. An ON event has to occur for a transition between OFF and ON states. The only ON EVENT currently used
is a low to high transition on the POWER_EN pin. This must occur with VDD_IN connected to a power rail, and POWER_EN is
asserted (at a logic1). The POWER_EN control is the carrier board indication to the Jetson module that the VDD_VIN power is

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 12
Chapter 21. Data Sheets 233

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

good. The Carrier board should assert this high only when VDD_IN has reached its required voltage level and is stable. This
prevents the Jetson module from powering up until the VDD_IN power is stable.

NOTE: The Jetson Nano module does include an Auto-Power-On option; a system input that enables the
module to power on if asserted. For more information on available signals and broader system
usage, see the Jetson Nano Product Design Guide.

2.3.2.2 OFF State


The OFF state is the default state when the system is not powered. It can only be entered from the ON state, through an OFF
event. OFF Events are listed in the table below.

Table 3 OFF State Events

Event Details Preconditions

HW Shutdown Set POWER_EN pin to zero for at least 100µS, the internal PMIC will In ON State
start shutdown sequence

SW Shutdown Software initiated shutdown ON state, Software operational

Thermal Shutdown If the internal temperature of the Jetson module reaches an unsafe Any power state
temperature, the hardware is designed to initiate a shutdown

2.3.2.3 SLEEP State


The Sleep state can only be entered from the ON state. This state allows the Jetson module to quickly resume to an
operational state without performing a full boot sequence. In this state the Jetson module operates in low power with enough
circuitry powered to allow the device to resume and re-enter the ON state. During this state the output signals from Jetson
module are maintained at their logic level prior to entering the state (i.e., they do not change to a 0V level).

The SLEEP state can only be entered directly by software. For example, operating within an OS, with no operations active for
a certain time can trigger the OS to initiate a transition to the SLEEP state.

To Exit the SLEEP state a WAKE event must occur. WAKE events can occur from within the Jetson module or from external
devices through various pins on the Jetson Nano connector. A full list of Wake enabled pins is available in the pinmux.

Table 4 SLEEP State Events

Event Details

RTC WAKE up Timers within the Jetson module can be programmed, on SLEEP entry. When these expire they
create a WAKE event to exit the SLEEP state.

Thermal Condition If the Jetson module internal temperature exceeds programmed hot and cold limits the system is
forced to wake up, so it can report and take appropriate action (shut down for example)

USB VBUS detection If VBUS is applied to the system (USB cable attached) then the device can be configured to Wake
and enumerate

2.4 Thermal and Power Monitoring


The Jetson Nano is designed to operate under various workloads and environmental conditions. It has been designed so that
an active or passive heat sinking solution can be attached. The module contains various methods through hardware and
software to limit the internal temperature to within operating limits. See the Jetson Nano Thermal Design Guide for more
details.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 13
234

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

2.5 Power Sequencing


The Jetson Nano module is required to be powered on and off in a known sequence. Sequencing is determined through a set
of control signals; the SYS_RESET* signal (when deasserted) is used to indicate when the carrier board can power on. The
following sections provide an overview of the power sequencing steps between the carrier board and Jetson Nano module.
Refer to the Jetson Nano Product Design Guide for system level details on the application of power, power sequencing, and
monitoring. The Jetson Nano module and the product carrier board must be power sequenced properly to avoid potential
damage to components on either the module or the carrier board system.

2.5.1 Power Up
During power up, the carrier board must wait until the signal SYS_RESET* is deasserted from the Jetson module before
enabling its power; the Jetson module will deassert the SYS_RESET* signal to enable the complete system to boot.

NOTE: I/O pins cannot be high (>0.5V) before SYS_RESET* goes high. When SYS_RESET* is low, the
maximum voltage applied to any I/O pin is 0.5V. For more information, refer to the Jetson Nano
Product Design Guide.

Figure 2 Power-up Sequence (No Power Button – Auto-Power-On Enabled)

VDD_IN

POWER_EN

Module Power

SYS_RESET*

Carrier Board Supplies

2.5.2 Power Down


In a shutdown event the Jetson module asserts SHUTDOWN_REQ*. The SHUTDOWN_REQ* must be serviced by the carrier
board to toggle POWER_EN from high to low, even in cases of sudden power loss. The Jetson module starts the power off
sequence when POWER_EN is deasserted; SYS_RESET* is asserted by the Jetson module, allowing the carrier board to put
any components into a known state and power down.

Figure 3 Power Down Sequence (Initiated by SHUTDOWN_REQ* Assertion)

VDD_IN

SHUTDOWN_REQ*

POWER_EN T < 10uS

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 14
Chapter 21. Data Sheets 235

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

3.0 Pin Descriptions


The primary interface to Jetson Nano is via a 260-pin SO-DIMM connector. Connector exposes power, ground, high-speed
and low-speed industry standard I/O connections. See the NVIDIA Jetson Nano Product Design Guide for details on
integrating the module and mating connector into product designs.

The I/O pins on the SO-DIMM are comprised of both Single Function I/O (SFIO) and Multi-Purpose digital I/O (MPIO) pins.
Each MPIO can be configured to act as a GPIO or it can be assigned for use by a particular I/O controller. Though each MPIO
has up to five functions (GPIO function and up to four SFIO functions), a given MPIO can only act as a single function at a
given point in time. The functions for each pin on the Jetson module are fixed to a single SFIO function or as a GPIO. The
different MPIO pins share a similar structure, but there are several varieties of such pins. The varieties are designed to
minimize the number of on-board components (such as level shifters or pull-up resistors) required in Jetson Nano designs.

MPIO pin types:

 ST (standard) pins are the most common pins on the chip. They are used for typical General Purpose I/O.
 DD (dual-driver) pins are similar to the ST pins. A DD pin can tolerate its I/O pin being pulled up to 3.3V (regardless
of supply voltage) if the pin’s output-driver is set to open-drain mode. There are special power-sequencing
considerations when using this functionality.
NOTE: The output of DD pins cannot be pulled High during deep-power-down (DPD).

 CZ (controlled output impedance) pins are optimized for use in applications requiring tightly controlled output
impedance. They are similar to ST pins except for changes in the drive strength circuitry and in the weak pull-ups/-
downs. CZ pins are included on the VDDIO_SDMMC3 (Module SDMMC pins) power rail; also includes a CZ_COMP
pin. Circuitry within the Jetson module continually matches the output impedance of the CZ pins to the on-board pull-
up/-down resistors attached to the CZ_COMP pins.
 LV_CZ (low voltage-controlled impedance) pins are similar to CZ pins but are optimized for use with a 1.2V supply
voltage (and signaling level). They support a 1.8V supply voltage (and signaling level) as a secondary mode. The
Jetson nano uses LV_CZ pins for SPI interfaces operating at 1.8V.
 DP_AUX pin is used as an Auxiliary control channel for the DisplayPort which needs differential signaling. Because
the same I/O block is used for DisplayPort and HDMI to ensure the control path to the display interface is minimized,
the DP_AUX pins can operate in open-drain mode so that HDMI’s control path (i.e., DDC interface which needs I2C)
can also be used in the same pin.

Each MPIO pin consists of:

 An output driver with tristate capability, drive strength controls and push-pull mode, open-drain mode, or both
 An input receiver with either Schmitt mode, CMOS mode, or both
 A weak pull-up and a weak pull-down

MPIO pins are partitioned into multiple “pin control groups” with controls being configured for the group. During normal
operation, these per-pin controls are driven by the pinmux controller registers. During deep sleep, the PMC bypasses and then
resets the pinmux controller registers. Software reprograms these registers as necessary after returning from deep sleep.

Refer to the Tegra X1 (SoC) Technical Reference Manual for more information on modifying pin controls.

3.1 MPIO Power-on Reset Behavior


Each MPIO pin has a deterministic power-on reset (PoR) state. The particular reset state for each pin is chosen to minimize
the need of on-board components like pull-up resistors in a Jetson Nano-based system. For example, the on-chip weak pull-
ups are enabled during PoR for pins which are usually used to drive active-low chip selects.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 15
236

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

3.2 MPIO Deep Sleep Behavior


Deep Sleep is an ultra-low-power standby state in which the Jetson Nano maintains much of its I/O state while most of the
chip is powered off. The following lists offer a simplified description of the deep sleep entry and exit concentrating on those
aspects which relate to the MPIO pins. During deep sleep most of the pins are put in a state called Deep Power Down (DPD).
The sequence for entering to DPD is same across pins. Specific variations are there in some pins in terms of type of features
that are available in DPD.

NOTE: The output of DD pins cannot be pulled High during deep-power-down (DPD).
OD pins do NOT retain their output during DPD. OD pins should NOT be configured as GPIOs in a platform
where they are expected to hold a value during DPD.

ALL MPIO pins do NOT have identical behavior during deep sleep. They differ with regard to:
 Input buffer behavior during deep sleep
- Forcibly disabled OR
- Enabled for use as a “GPIO wake event” OR
- Enabled for some other purpose (e.g., a “clock request” pin)
 Output buffer behavior during deep sleep
- Maintain a static programmable (0, 1, or tristate) constant value OR
- Capable of changing state (i.e., dynamic while the chip is still in deep sleep)
 Weak pull-up/pull-down behavior during deep sleep
- Forcibly disabled OR
- Can be configured
 Pins that do not enter deep sleep
- Some of the pins whose outputs are dynamic during deep sleep are of special type and they do not enter deep
sleep (e.g., pins that are associated with PMC logic do not enter deep sleep, pins that are associated with JTAG
do not enter into deep sleep any time.

Figure 4 DPD Wait Times

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 16
Chapter 21. Data Sheets 237

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

3.3 GPIO Pins


The Jetson Nano has multiple dedicated GPIOs. Each GPIO can be individually configurable as an Output, Input, or Interrupt
source with level/edge controls. The pins listed in the following table are dedicated GPIOs; some with alternate SFIO
functionality. Many other pins not included in this list are capable of being configured as GPIOs instead of the SFIO
functionality the pin name suggests (e.g., UART, SPI, I2S, etc.). All pins that can support GPIO functionality have this exposed
in the Pinmux.

Table 5 Dedicated GPIO Pin Descriptions

Pin Name Direction Type PoR Alternate Function

87 GPIO00 Bidirectional Open-Drain [DD] 0 USB VBUS Enable (USB_VBUS_EN0)

118 GPIO01 Bidirectional CMOS – 1.8V [ST] pd Camera MCLK #2 (CLK)


124 GPIO02 Bidirectional CMOS – 1.8V [ST] pd

126 GPIO03 Bidirectional CMOS – 1.8V [ST] pd

127 GPIO04 Bidirectional CMOS – 1.8V [ST] pd

128 GPIO05 Bidirectional CMOS – 1.8V [ST] pd

130 GPIO06 Bidirectional CMOS – 1.8V [ST] pd

206 GPIO07 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal (PWM)

208 GPIO08 Bidirectional CMOS – 1.8V [ST] pd SD Card Detect (SDMMC_CD)

211 GPIO09 Bidirectional CMOS – 1.8V [ST] pd Audio Clock (AUD_MCLK)


212 GPIO10 Bidirectional CMOS – 1.8V [ST] pd

216 GPIO11 Bidirectional CMOS – 1.8V [ST] pd Camera MCLK #3

218 GPIO12 Bidirectional CMOS – 1.8V [ST] pd

228 GPIO13 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal
230 GPIO14 Bidirectional CMOS – 1.8V [ST] pd Pulse Width Modulation Signal

114 CAM0_PWDN Bidirectional CMOS – 1.8V [ST] pd

120 CAM1_PWDN Bidirectional CMOS – 1.8V [ST] pd

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 17
238

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

4.0 Interface Descriptions


The following sections outline the interfaces available on the Jetson Nano module and details the module pins used to interact
with and control each interface. See the Tegra X1 Series SoC Technical Reference Manual for complete functional
descriptions, programming guidelines and register listings for each of these blocks.

4.1 USB
Standard Notes

Universal Serial Bus Specification Revision 3.0 Refer to specification for related interface timing details.

Universal Serial Bus Specification Revision 2.0 USB Battery Charging Specification, version 1.0; including Data Contact Detect protocol
Modes: Host and Device
Speeds: Low, Full, and High
Refer to specification for related interface timing details.
Enhanced Host Controller Interface Specification Refer to specification for related interface timing details.
for Universal Serial Bus revision 1.0

An xHCI/Device controller (named XUSB) supports the xHCI programming model for scheduling transactions and interface
managements as a host that natively supports USB 3.0, USB 2.0, and USB 1.1 transactions with its USB 3.0 and USB 2.0
interfaces. The XUSB controller supports USB 2.0 L1 and L2 (suspend) link power management and USB 3.0 U1, U2, and U3
(suspend) link power managements. The XUSB controller supports remote wakeup, wake on connect, wake on disconnect,
and wake on overcurrent in all power states, including sleep mode.

USB 2.0 Ports

Each USB 2.0 port operates in USB 2.0 High Speed mode when connecting directly to a USB 2.0 peripheral and operates in
USB 1.1 Full- and Low-Speed modes when connecting directly to a USB 1.1 peripheral. All USB 2.0 ports operating in High
Speed mode share one High-Speed Bus Instance, which means 480 Mb/s theoretical bandwidth is distributed across these
ports. All USB 2.0 ports operating in Full- or Low-Speed modes share one Full/Low-Speed Bus Instance, which means 12
Mb/s theoretical bandwidth is distributed across these ports.

USB 3.0 Port

The USB 3.0 port only operates in USB 3.0 Super Speed mode (5 Gb/s theoretical bandwidth).

Table 6 USB 2.0 Pin Descriptions

Pin Name Direction Type Description

87 GPIO0 Input USB VBUS, 5V USB 0 VBUS Detect (USB_VBUS_EN0). Do not feed 5V
directly into this pin; see the Jetson Nano Product Design
Guide for complete details.

109 USB0_D_N Bidirectional USB PHY USB 2.0 Port 0 Data


111 USB0_D_P

115 USB1_D_N Bidirectional USB PHY USB 2.0 Port 1 Data


117 USB1_D_P

121 USB2_D_N Bidirectional USB PHY USB 2.0 Port 2 Data


123 USB2_D_P

Table 7 USB 3.0 Pin Descriptions

Pin Name Direction Type Description

163 USBSS_RX_P Input USB SS PHY USB 3.0 SS Receive


161 USBSS_RX_N

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 18
Chapter 21. Data Sheets 239

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name Direction Type Description


168 USBSS_TX_P Output USB SS PHY USB 3.0 SS Transmit
166 USBSS_TX_N

4.2 PCI Express (PCIe)


Standard Notes
PCI Express Base Specification Revision 2.0 Jetson Nano meets the timing requirements for the Gen2 (5.0 GT/s) data rates. Refer to
specification for complete interface timing details.
Although NVIDIA validates that the Jetson Nano design complies with the PCIe
specification, PCIe software support may be limited.

The Jetson module integrates a single PCIe Gen2 controller supporting:


 Connections to a single (x1/2/4) endpoint
 Upstream and downstream AXI interfaces that serve as the control path from the Jetson Nano to the external PCIe
device.
 Gen1 (2.5 GT/s/lane) and Gen2 (5.0 GT/s/lane) speeds.

NOTE: Upstream Type 1 Vendor Defined Messages (VDM) should be sent by the Endpoint Port (EP) if the
Root Port (RP) also belongs to same vendor/partner; otherwise the VDM is silently discarded.

See the Jetson Nano Product Design Guide for supported USB 3.0/PCIe configuration and connection examples.

Table 8 PCIe Pin Descriptions

Pin Name Direction Type PoR Description

179 PCIE_WAKE* Input Open Drain 3.3V z PCI Express Wake


This signal is used as the PCI Express defined WAKE#
signal. When asserted by a PCI Express device, it is a
request that system power be restored. No interrupt or other
consequences result from the assertion of this signal. On
module 100kΩ pull-up to 3.3V

160 PCIE0_CLK_N Output PCIe PHY 0 PCIe Reference Clock


162 PCIE0_CLK_P 0

180 PCIE0_CLKREQ* Bidirectional Open Drain 3.3V z PCIe Reference Clock Request
This signal is used by a PCIe device to indicate it needs the
PCIE0_CLK_N and PCIE0_CLK_P to actively drive reference
clock. On module 47kΩ pull-up to 3.3V

181 PCIE0_RST* Output Open Drain 3.3V 0 PCIe Reset


This signal provides a reset signal to all PCIe links. It must be
asserted 100 ms after the power to the PCIe slots has
stabilized. On module 47kΩ pull-up to 3.3V

157 PCIE0_RX3_P Input PCIe PHY PCIe Receive (Lane 3)


155 PCIE0_RX3_N

151 PCIE0_RX2_P Input PCIe PHY PCIe Receive (Lane 2)


149 PCIE0_RX2_N

139 PCIE0_RX1_P Input PCIe PHY PCIe Receive (Lane 1)


137 PCIE0_RX1_N

133 PCIE0_RX0_P Input PCIe PHY PCIe Receive (Lane 0)


131 PCIE0_RX0_N

156 PCIE0_TX3_P Output PCIe PHY PCIe Transmit (Lane 3)


154 PCIE0_TX3_N

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 19
240

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name Direction Type PoR Description


150 PCIE0_TX2_P Output PCIe PHY PCIe Transmit (Lane 2)
148 PCIE0_TX2_N

142 PCIE0_TX1_P Output PCIe PHY PCIe Transmit (Lane 1)


140 PCIE0_TX1_N

136 PCIE0_TX0_P Output PCIe PHY PCIe Transmit (Lane 0)


134 PCIE0_TX0_N

4.3 Display Interfaces


The Jetson Nano Display Controller Complex integrates a MIPI-DSI interface and Serial Output Resource (SOR) to collect
pixels from the output of the display pipeline, format/encode them to desired format, and then streams to various output
devices. The SOR consists of several individual resources which can be used to interface with different display devices such
as HDMI, DP, or eDP.

4.3.1 MIPI Display Serial Interface (DSI)


The Display Serial Interface (DSI) is a serial bit-stream replacement for the parallel MIPI DPI and DBI display interface
standards. DSI reduces package pin-count and I/O power consumption. DSI support enables both display controllers to
connect to an external display(s) with a MIPI DSI receiver. The DSI transfers pixel data from the internal display controller to
an external third-party LCD module.

Features:
 PHY Layer
- Start / End of Transmission. Other out-of-band signaling
- Per DSI interface: one Clock Lane; two Data Lanes
- Supports link configuration – 1x 2
- Maximum link rate 1.5Gbps as per MIPI D-PHY 1.1v version
- Maximum 10MHz LP receive rate
 Lane Management Layer with Distributor
 Protocol Layer with Packet Constructor
 Supports MIPI DSI 1.0.1v version mandatory features
 Command Mode (One-shot) with Host and/or display controller as master
 Clocks
- Bit Clock: Serial data stream bit-rate clock
- Byte Clock: Lane Management Layer Byte-rate clock
- Application Clock: Protocol Layer Byte-rate clock.
 Error Detection / Correction
- ECC generation for packet Headers
- Checksum generation for Long Packets
 Error recovery
 High-Speed Transmit timer
 Low-Power Receive timer
 Turnaround Acknowledge Timeout

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 20
Chapter 21. Data Sheets 241

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table 9 DSI Pin Descriptions

Pin Name Direction Type Description

76 DSI_CLK_N Output MIPI D-PHY Differential output clock for DSI interface
78 DSI_CLK_P

82 DSI_D1_N Output MIPI D-PHY Differential data lanes for DSI interface.
84 DSI_D1_P

70 DSI_D0_N Bidirectional MIPI D-PHY Differential data lanes for DSI interface. DSI lane can read data
72 DSI_D0_P back from the panel side in low power (LP) mode.

4.3.2 High-Definition Multimedia Interface (HDMI) and DisplayPort (DP) Interfaces


Standard Notes

High-Definition Multimedia Interface (HDMI) > 340MHz pixel clock


Specification, version 2.0 Scrambling support
Clock/4 support (1/40 bit-rate clock)

The HDMI and DP interfaces share the same set of interface pins. A new transport mode was introduced in HDMI 2.0 to
enable link clock frequencies greater than 340MHz and up to 600MHz. For transfer rates above 340MHz, there are two main
requirements:

 All link data, including active pixel data, guard bands, data islands and control islands must be scrambled.
 The TMDS clock lane must toggle at CLK/4 instead of CLK. Below 340MHz, the clock lane toggles as normal
(independent of the state of scrambling).

Features:

 HDMI
- HDMI 2.0 mode (3.4Gbps < data rate <= 6Gbps)
- HDMI 1.4 mode (data rate<=3.4Gbps)
- Multi-channel audio from HDA controller, up to eight channels 192kHz 24-bit.
- Vendor Specific Info-frame (VSI) packet transmission
- 24-bit RGB pixel formats
- Transition Minimized Differential Signaling (TMDS) functional up to 340MHz pixel clock rate
 DisplayPort
- Display Port mode: interface is functional up to 540MHz pixel clock rate (i.e., 1.62GHz for RBR, 2.7GHz for HBR,
and 5.4GHz for HBR2).
- 8b/10b encoding support
- External Dual Mode standard support
- Audio streaming support

Table 10 HDMI Pin Descriptions

Pin Name Direction Type Description

83 DP1_TXD3_P Differential Output AC-Coupled on Carrier Board DP Data lane 3 or HDMI Differential Clock. AC
81 DP1_TXD3_N [DP] coupling required on carrier board. For HDMI, pull-
downs (with disable) also required on carrier board.

77 DP1_TXD2_P Differential Output AC-Coupled on Carrier Board HDMI Differential Data lanes 2:0. AC coupling required
75 DP1_TXD2_N [DP] on carrier board. For HDMI, pull-downs (with disable)
also required on carrier board.
71 DP1_TXD1_P
69 DP1_TXD1_N HDMI:
65 DP1_TXD0_P DP1_TXD2_[P,N] = HDMI Lane 0
DP1_TXD1_[P,N] = HDMI Lane 1

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 21
242

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name Direction Type Description


63 DP1_TXD0_N DP1_TXD0_[P,N] = HDMI Lane 2

96 DP1_HPD Input CMOS – 1.8V [ST] HDMI Hot Plug detection. Level shifter required as this
pin is not 5V tolerant.

94 HDMI_CEC Bidirectional Open Drain, 1.8V [DD] Consumer Electronics Control (CEC) one-wire serial
bus.
NVIDIA provides low level CEC APIs (read/write).
These are not supported in earlier Android releases.
For additional CEC support, 3rd party libraries need to
be made available.

100 DP1_AUX_P Bidirectional Open-Drain, 1.8V (3.3V tolerant DDC Serial Clock for HDMI. Level shifter required; pin
- DDC) [DP_AUX] is not 5V tolerant.

98 DP1_AUX_N Bidirectional Open-Drain, 1.8V (3.3V tolerant DDC Serial Data. Level shifter required; pin is not 5V
- DDC) tolerant.

Table 11 DisplayPort on DP1 Pin Descriptions

Pin Name Direction Type Description

83 DP1_TXD3_P Differential Output AC-Coupled on Carrier Board DisplayPort 1 Differential Data lanes 2:0. AC coupling
81 DP1_TXD3_N [DP] required on carrier board.
77 DP1_TXD2_P DP1_TXD2_[P,N] = DP Lane 2
75 DP1_TXD2_N DP1_TXD1_[P,N] = DP Lane 1
71 DP1_TXD1_P DP1_TXD0_[P,N] = DP Lane 0
69 DP1_TXD1_N
65 DP1_TXD0_P
63 DP1_TXD0_N

96 DP1_HPD Input CMOS – 1.8V [ST] DisplayPort 1 Hot Plug detection. Level shifter required
and must be non-inverting.

100 DP1_AUX_P Bidirectional Open-Drain, 1.8V [DP_AUX] DisplayPort 1 auxiliary channels. AC coupling required
98 DP1_AUX_N on carrier board.

4.3.3 Embedded DisplayPort (eDP) Interface


Standard Notes
Embedded DisplayPort 1.4 Supported eDP 1.4 features:
 Additional link rates
 Enhanced framing
 Power sequencing
 Reduced aux timing
 Reduced main voltage swing

eDP is a mixed-signal interface consisting of four differential serial output lanes and one PLL. This PLL is used to generate a
high frequency bit-clock from an input pixel clock enabling the ability to handle 10-bit parallel data per lane at the pixel rate for
the desired mode. Embedded DisplayPort (eDP) modes (1.6GHz for RBR, 2.16GHz, 2.43GHz, 2.7GHz for HBR, 3.42GHz,
4.32GHz and 5.4GHz for HBR2).

NOTE: eDP has been tested according to DP1.2b PHY CTS even though eDPv1.4 supports lower swing
voltages and additional intermediate bit rates. This means the following nominal voltage levels
(400mV, 600mV, 800mV, 1200mV) and data rates (RBR, HBR, HBR2) are tested. This interface can
be tuned to drive lower voltage swings below 400mV and can be programmed to other intermediate
bit rates as per the requirements of the panel and the system designer.

DisplayPort on DP0 is limited to display functionality only; no HDCP or audio support.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 22
Chapter 21. Data Sheets 243

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table 12 eDP (or DisplayPort on DP0) Pin Descriptions

Pin Name Direction Type Description

59 DP0_TXD3_P Differential Output AC-Coupled on Carrier DP0 Differential Data. AC coupling & pull-
57 DP0_TXD3_N Board downs (with disable) required on carrier board.
53 DP0_TXD2_P [DP] DP0_TXD3_[P,N] = DisplayPort 0 Data Lane 3
51 DP0_TXD2_N DP0_TXD2_[P,N] = DisplayPort 0 Data Lane 2
47 DP0_TXD1_P DP0_TXD1_[P,N] = DisplayPort 0 Data Lane 1
45 DP0_TXD1_N DP0_TXD0_[P,N] = DisplayPort 0 Data Lane 0
41 DP0_TXD0_P
39 DP0_TXD0_N

88 DP0_HPD Input CMOS – 1.8V [ST] DP0 Hot Plug detection. Level shifter required
as this pin is not 5V tolerant

92 DP0_AUX_P Bidirectional AC-Coupled on Carrier DP0 auxiliary channels. AC coupling required


90 DP0_AUX_N Board [DP_AUX] on Carrier board.

4.4 MIPI Camera Serial Interface (CSI) / VI (Video Input)


Standard

MIPI CSI 2.0 Receiver specification

MIPI D-PHY ® v1.2 Physical Layer specification

The Camera Serial Interface (CSI) is based on MIPI CSI 2.0 standard specification and implements the CSI receiver which
receives data from an external camera module with CSI transmitter. The Video Input (VI) block receives data from the CSI
receiver and prepares it for presentation to system memory or the dedicated image signal processor (ISP) execution
resources.

Features:

 Supports both x4-lane and x2-lane sensor camera configurations:


- x4 only configuration (up to three active streams)
- x4 + x2 configurations (up to four active streams)
 Supported input data formats:
- RGB: RGB888, RGB666, RGB565, RGB555, RGB444
- YUV: YUV422-8b, YUV420-8b (legacy), YUV420-8b, YUV444-8b
- RAW: RAW6, RAW7, RAW8, RAW10, RAW12, RAW14
- DPCM: user defined
- User defined: JPEG8
- Embedded: Embedded control information
 Supports single-shot mode
 Physical Interface (MIPI D-PHY) Modes of Operation
- High Speed Mode – High-speed differential signaling up to 1.5Gbps; burst transmission for low power
- Low Power Control – Single-ended 1.2V CMOS level; low-speed signaling for handshaking.
- Low Power Escape – Low-speed signaling for data, used for escape command entry only.

If the two streams come from a single source, then the streams are separated using a filter indexed on different data types. In
case of separation using data types, the normal data type is separated from the embedded data type.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 23
244

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table 13 CSI Pin Descriptions

Pin Name Direction Type Description

10 CSI0_CLK_N Input MIPI D-PHY CSI 0 Clock–


12 CSI0_CLK_P Input MIPI D-PHY CSI 0 Clock+

4 CSI0_D0_N Input MIPI D-PHY CSI 0 Data 0–

6 CSI0_D0_P Input MIPI D-PHY CSI 0 Data 0+

16 CSI0_D1_N Input MIPI D-PHY CSI 0 Data 1–


18 CSI0_D1_P Input MIPI D-PHY CSI 0 Data 1+

3 CSI1_D0_N Input MIPI D-PHY CSI 1 Data 0–

5 CSI1_D0_P Input MIPI D-PHY CSI 1 Data 0+

15 CSI1_D1_N Input MIPI D-PHY CSI 1 Data 1–

17 CSI1_D1_P Input MIPI D-PHY CSI 1 Data 1+

28 CSI2_CLK_N Input MIPI D-PHY CSI 2 Clock–

30 CSI2_CLK_P Input MIPI D-PHY CSI 2 Clock+

22 CSI2_D0_N Input MIPI D-PHY CSI 2 Data 0–

24 CSI2_D0_P Input MIPI D-PHY CSI 2 Data 0+

34 CSI2_D1_N Input MIPI D-PHY CSI 2 Data 1–

36 CSI2_D1_P Input MIPI D-PHY CSI 2 Data 1+

27 CSI3_CLK_N Input MIPI D-PHY CSI 3 Clock–


29 CSI3_CLK_P Input MIPI D-PHY CSI 3 Clock+

21 CSI3_D0_N Input MIPI D-PHY CSI 3 Data 0–

23 CSI3_D0_P Input MIPI D-PHY CSI 3 Data 0+

33 CSI3_D1_N Input MIPI D-PHY CSI 3 Data 1–


35 CSI3_D1_P Input MIPI D-PHY CSI 3 Data 1+

52 CSI4_CLK_N Input MIPI D-PHY CSI 4 Clock–

54 CSI4_CLK_P Input MIPI D-PHY CSI 4 Clock+

46 CSI4_D0_N Input MIPI D-PHY CSI 4 Data 0–

48 CSI4_D0_P Input MIPI D-PHY CSI 4 Data 0+

58 CSI4_D1_N Input MIPI D-PHY CSI 4 Data 1–

60 CSI4_D1_P Input MIPI D-PHY CSI 4 Data 1+

40 CSI4_D2_N Input MIPI D-PHY CSI 4 Data 2–


42 CSI4_D2_P Input MIPI D-PHY CSI 4 Data 2+

64 CSI4_D3_N Input MIPI D-PHY CSI 4 Data 3–

66 CSI4_D3_P Input MIPI D-PHY CSI 4 Data 3+

Table 14 Camera Clock and Control Pin Descriptions

Pin Name I/O Pin Type PoR Description

213 CAM_I2C_SCL Bidirectional Open Drain – 3.3V [DD] z Camera I2C Clock

215 CAM_I2C_SDA Bidirectional Open Drain – 3.3V [DD] z Camera I2C Data

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 24
Chapter 21. Data Sheets 245

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name I/O Pin Type PoR Description


116 CAM0_MCLK Output CMOS – 1.8V [ST] PD Camera 1 Reference Clock

114 CAM0_PWDN Output CMOS – 1.8V [ST] PD Camera 1 Powerdown or GPIO

122 CAM1_MCLK Output CMOS – 1.8V [ST] PD Camera 2 Reference Clock

120 CAM1_PWDN Output CMOS – 1.8V [ST] PD Camera 2 Powerdown or GPIO

4.5 SD / SDIO

Standard Notes

SD Specifications Part A2 SD Host Controller Standard


Specification Version 4.00

SD Specifications Part 1 Physical Layer Specification Version 4.00

SD Specifications Part E1 SDIO Specification Version 4.00 Support for SD 4.0 Specification without UHS-II

Embedded Multimedia Card (eMMC), Electrical Standard 5.1

The SecureDigital (SD)/Embedded MultiMediaCard (eMMC) controller is used to support the on-module eMMC and a single
SDIO interface made available for use with SDIO peripherals; it supports Default and High-Speed modes.

The SDMMC controller has a direct memory interface and is capable of initiating data transfers between memory and external
device. The SDMMC controller supports both the SD and eMMC bus protocol and has an APB slave interface to access
configuration registers. Interface is intended for supporting various compatible peripherals with an SD/MMC interface.

Table 15 SD/SDIO Controller I/O Capabilities

Controller Bus Width Supported I/O bus Max Bandwidth Notes


Voltages (V) clock (MHz) (MBps)

SD/SDIO Card 4 1.8 / 3.3 208 104 Available at connector for SDIO or SD Card use

eMMC 8 1.8 200 400 On-module eMMC

Table 16 SD/SDIO Pin Descriptions

Pin Name I/O Pin Type PoR Description


229 SDMMC_CLK Output CMOS – 1.8V [CZ] PD SDIO/MMC Clock

227 SDMMC_CMD Bidirectional CMOS – 1.8V [CZ] PU SDIO/MMC Command

225 SDMMC_DAT3 Bidirectional CMOS – 1.8V [CZ] PU SDIO/MMC Data bus


223 SDMMC_DAT2
221 SDMMC_DAT1
219 SDMMC_DAT0

208 GPIO08 Bidirectional CMOS – 1.8V [ST] PD SD Card Detect (SDMMC_CD)

4.6 Inter-IC Sound (I2S)


Standard

Inter-IC Sound (I2S) specification

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 25
246

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

The I2S controller transports streaming audio data between system memory and an audio codec. The I2S controller supports
I2S format, Left-justified Mode format, Right-justified Mode format, and DSP mode format, as defined in the Philips inter-IC-
sound (I2S) bus specification.

The I2S and PCM (master and slave modes) interfaces support clock rates up to 24.5760MHz.

The I2S controller supports point-to-point serial interfaces for the I2S digital audio streams. I2S-compatible products, such as
compact disc players, digital audio tape devices, digital sound processors, and those with digital TV sound may be directly
connected to the I2S controller. The controller also supports the PCM and telephony mode of data-transfer. Pulse-Code-
Modulation (PCM) is a standard method used to digitize audio (particularly voice) patterns for transmission over digital
communication channels. The Telephony mode is used to transmit and receive data to and from an external mono CODEC in
a slot-based scheme of time-division multiplexing (TDM). The I2S controller supports bidirectional audio streams and can
operate in half-duplex or full-duplex mode.

Features:

 Basic I2S modes to be supported (I2S, RJM, LJM and DSP) in both Master and Slave modes.
 PCM mode with short (one-bit-clock wide) and long-fsync (two bit-clocks wide) in both master and slave modes.
 NW-mode with independent slot-selection for both Tx and Rx
 TDM mode with flexibility in number of slots and slot(s) selection.
 Capability to drive-out a High-z outside the prescribed slot for transmission
 Flow control for the external input/output stream.

Table 17 Audio Pin Descriptions

Pin Name Direction Type PoR Description

211 GPIO09 Output CMOS – 1.8V [ST] PD Audio Codec Master Clock (AUD_MCLK)

195 I2S0_DIN Input CMOS – 1.8V [CZ] PD I2S Audio Port 0 Data In
193 I2S0_DOUT Output CMOS – 1.8V [CZ] PD I2S Audio Port 0 Data Out

197 I2S0_FS Bidirectional CMOS – 1.8V [CZ] PD I2S Audio Port 0 Frame Select (Left/Right Clock)

199 I2S0_SCLK Bidirectional CMOS – 1.8V [CZ] PD I2S Audio Port 0 Clock

222 I2S1_DIN Input CMOS – 1.8V [ST] PD I2S Audio Port 1 Data In
220 I2S1_DOUT Output CMOS – 1.8V [ST] PD I2S Audio Port 1 Data Out

224 I2S1_FS Bidirectional CMOS – 1.8V [ST] PD I2S Audio Port 1 Frame Select (Left/Right Clock)

226 I2S1_SCLK Bidirectional CMOS – 1.8V [ST] PD I2S Audio Port 1 Clock

4.7 Miscellaneous Interfaces

4.7.1 Inter-Chip Communication (I2C)


Standard

NXP inter-IC-bus (I2C) specification

This general purpose I2C controller allows system expansion for I2C -based devices as defined in the NXP inter-IC-bus (I2C)
specification. The I2C bus supports serial device communications to multiple devices; the I2C controller handles clock source
negotiation, speed negotiation for standard and fast devices, 7-bit slave address support according to the I2C protocol and
supports master and slave mode of operation.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 26
Chapter 21. Data Sheets 247

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

The I2C controller supports the following operating modes: Master – Standard-mode (up to 100Kbit/s), Fast-mode (up to 400
Kbit/s), Fast-mode plus (Fm+, up to 1Mbit/s); Slave – Standard-mode (up to 100Kbit/s), Fast-mode (up to 400 Kbit/s), Fast-
mode plus (Fm+, up to 1Mbit/s).

Table 18 I2C Pin Descriptions

Pin Name I/O Pin Type PoR Description

185 I2C0_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
187 I2C0_SDA z shifter. I2C 0 Clock/Data pins. On module 2.2kΩ
pull-up to 3.3V.

189 I2C1_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
191 I2C1_SDA z shifter. I2C 1 Clock/Data pins. On module 2.2kΩ
pull-up to 3.3V.

232 I2C2_SCL Bidirectional Open Drain – 1.8V [DD] z Only 1.8V devices supported without level
234 I2C2_SDA z shifter. I2C 2 Clock/Data pins. On module 2.2kΩ
pull-up to 1.8V.

213 CAM_I2C_SCL Bidirectional Open Drain – 3.3V [DD] z Only 3.3V devices supported without level
2
215 CAM_I2C_SDA z shifter. Camera I C Clock/Data pins. On module
4.7kΩ pull-up to 3.3V.

4.7.2 Serial Peripheral Interface (SPI)


The SPI controllers operate up to 65Mbps in master mode and 45Mbps in slave mode. It allows a duplex, synchronous, serial
communication between the controller and external peripheral devices. It consists of four signals, SS_N (Chip select), SCK
(clock), MOSI (Master data out and Slave data in) and MISO (Slave data out and master data in). The data is transferred on
MOSI or MISO based on the data transfer direction on every SCK edge. The receiver always receives the data on the other
edge of SCK.

Features:
 Independent Rx FIFO and Tx FIFO.
 Software controlled bit-length supports packet sizes of 1 to 32 bits.
 Packed mode support for bit-length of 7 (8-bit packet size) and 15 (16-bit packet size).
 SS_N can be selected to be controlled by software, or it can be generated automatically by the hardware on packet
boundaries.
 Receive compare mode (controller listens for a specified pattern on the incoming data before receiving the data in the
FIFO).
 Simultaneous receive and transmit supported
 Supports Master mode. Slave mode has not been validated

Table 19 SPI Pin Descriptions

Pin Name Direction Type PoR Description

95 SPI0_CS0* Bidirectional CMOS – 1.8V [LV-CZ] PU SPI 0 Chip Select 0

97 SPI0_CS1* Bidirectional CMOS – 1.8V [LV-CZ] PU SPI 0 Chip Select 1


93 SPI0_MISO Bidirectional CMOS – 1.8V [LV-CZ] PD SPI 0 Master In / Slave Out

89 SPI0_MOSI Bidirectional CMOS – 1.8V [LV-CZ] PD SPI 0 Master Out / Slave In

91 SPI0_SCK Bidirectional CMOS – 1.8V [LV-CZ] PD SPI 0 Clock

110 SPI1_CS0* Bidirectional CMOS – 1.8V [CZ] PU SPI 1 Chip Select 0

112 SPI1_CS1* Bidirectional CMOS – 1.8V [CZ] PU SPI 1 Chip Select 1

108 SPI1_MISO Bidirectional CMOS – 1.8V [CZ] PD SPI 1 Master In / Slave Out

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 27
248

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Pin Name Direction Type PoR Description


104 SPI1_MOSI Bidirectional CMOS – 1.8V [CZ] PD SPI 1 Master Out / Slave In

106 SPI1_SCK Bidirectional CMOS – 1.8V [CZ] PD SPI 1 Clock

Figure 5 SPI Master Timing Diagram

tCS
SPIx_CSx

tCSS tCSH tCSS tCSH

SPIx_SCK 0 n

tSU

SPIx_MOSI MSB IN LSB IN

tHD
SPIx_MISO

Table 20 SPI Master Timing Parameters


Symbol Parameter Minimum Maximum Unit

Fsck SPIx_SCK clock frequency 65 MHz

Psck SPIx_SCK period 1/Fsck ns

tCH SPIx_SCK high time 50%Psck -10% 50%Psck +10% ns

tCL SPIx_SCK low time 50%Psck -10% 50%Psck +10% ns

tCRT SPIx_SCK rise time (slew rate) 0.1 V/ns

tCFT SPIx_SCK fall time (slew rate) 0.1 V/ns

tSU SPIx_MOSI setup to SPIx_SCK rising edge 2 ns

tHD SPIx_MOSI hold from SPIx_SCK rising edge 3 ns

tCSS SPIx_CSx setup time 2 ns

tCSH SPIx_CSx hold time 3 ns

tCS SPIx_CSx high time 10 ns

Note: Polarity of SCLK is programmable. Data can be driven or input relative to either the rising edge (shown above) or falling edge.

Figure 6 SPI Slave Timing Diagram

tCSH
SPIx_CSx

tCSU tSCP tCH tCL tCCS

SPIx_SCK 0 1 n

tDSU
SPIx_MOSI

tDH tDD
SPIx_MISO

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 28
Chapter 21. Data Sheets 249

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table 21 SPI Slave Timing Parameters


Symbol Parameter Minimum Maximum Unit

tSCP SPIx_SCK period 2*(tSDD+ tMSU1) ns

tSCH SPIx_SCK high time tSDD + tMSU1 ns


1
tSCL SPIx_SCK low time tSDD + tMSU ns

tSCSU SPIx_CSx setup time 1 tSCP

tSCSH SPIx_CSx high time 1 tSCP

tSCCS SPIx_SCK rising edge to SPIx_CSx rising edge 1 1 tSCP

tSDSU SPIx_MOSI setup to SPIx_SCK rising edge 1 1 ns

tSDH SPIx_MOSI hold from SPIx_SCK rising edge 2 11 ns

1. t MSU is the setup time required by the external master


Note: Polarity of SCLK is programmable. Data can be driven or input relative to either the rising edge (shown above) or falling edge.

4.7.3 UART
UART controller provides serial data synchronization and data conversion (parallel-to-serial and serial-to-parallel) for both
receiver and transmitter sections. Synchronization for serial data stream is accomplished by adding start and stop bits to the
transmit data to form a data character. Data integrity is accomplished by attaching a parity bit to the data character. The parity
bit can be checked by the receiver for any transmission bit errors.

NOTE: The UART receiver input has low baud rate tolerance in 1-stop bit mode. External devices must use
2 stop bits.
In 1-stop bit mode, the Tegra UART receiver can lose sync between Tegra receiver and the external
transmitter resulting in data errors/corruption. In 2-stop bit mode, the extra stop bit allows the Tegra
UART receiver logic to align properly with the UART transmitter.

Features:
 Synchronization for the serial data stream with start and stop bits to transmit data and form a data character
 Supports both 16450- and 16550-compatible modes. Default mode is 16450
 Device clock up to 200MHz, baud rate of 12.5Mbits/second
 Data integrity by attaching parity bit to the data character
 Support for word lengths from five to eight bits, an optional parity bit and one or two stop bits
 Support for modem control inputs
 DMA capability for both Tx and Rx
 8-bit x 36 deep Tx FIFO
 11-bit x 36 deep Rx FIFO. Three bits of 11 bits per entry log the Rx errors in FIFO mode (break, framing, and parity
errors as bits 10, 9, 8 of FIFO entry)
 Auto sense baud detection
 Timeout interrupts to indicate if the incoming stream stopped
 Priority interrupts mechanism
 Flow control support on RTS and CTS
 Internal loopback
 SIR encoding/decoding (3/16 or 4/16 baud pulse widths to transmit bit zero)

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 29
250

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Table 22 UART Pin Descriptions

Pin Name Direction Type PoR Description

99 UART0_TXD Output CMOS – 1.8V [ST] PD UART 0 Transmit


101 UART0_RXD Input CMOS – 1.8V [ST] PU UART 0 Receive

103 UART0_RTS* Output CMOS – 1.8V [ST] PD UART 0 Request to Send

105 UART0_CTS* Input CMOS – 1.8V [ST] PD UART 0 Clear to Send

203 UART1_TXD Output CMOS – 1.8V [ST] PD UART 1 Transmit


205 UART1_RXD Input CMOS – 1.8V [ST] PD UART 1 Receive

207 UART1_RTS* Output CMOS – 1.8V [ST] PD UART 1 Request to Send

209 UART1_CTS* Input CMOS – 1.8V [ST] PU UART 1 Clear to Send

236 UART2_TXD Output CMOS – 1.8V [ST] PD UART 2 Transmit

238 UART2_RXD Input CMOS – 1.8V [ST] PD UART 2 Receive

4.7.4 Gigabit Ethernet


The Jetson Nano integrates a Realtek RTL81119ICG Gigabit Ethernet controller. The on-module Ethernet controller supports:

 10/100/1000 Mbps Gigabit Ethernet


 IEEE 802.3u Media Access Controller (MAC)

Table 23 Gigabit Ethernet Pin Descriptions

Pin Name Direction Type Description

194 GBE_LED_ACT Output Activity LED (yellow) enable

188 GBE_LED_LINK Output Link LED (green) enable. Link LED only illuminates if link established is
1000. 100/10 will not cause the Link LED to light up.

184 GBE_MDI0_N Bidirectional MDI GbE Transformer Data 0


186 GBE_MDI0_P

190 GBE_MDI1_N Bidirectional MDI GbE Transformer Data 1


192 GBE_MDI1_P

196 GBE_MDI2_N Bidirectional MDI GbE Transformer Data 2


198 GBE_MDI2_P

202 GBE_MDI3_N Bidirectional MDI GbE Transformer Data 3


204 GBE_MDI3_P

4.7.5 Fan
The Jetson Nano includes PWM and Tachometer functionality to enable fan control as part of a thermal solution. The Pulse
Width Modulator (PWM) controller is a frequency divider with a varying pulse width. The PWM runs off a device clock

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 30
Chapter 21. Data Sheets 251

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

programmed in the Clock and Reset controller and can be any frequency up to the device clock maximum speed of 48MHz.
The PWFM gets divided by 256 before being subdivided based on a programmable value.

Table 24 Fan Pin Descriptions

Pin Name Direction Type PoR Description

230 GPIO14 Output CMOS – 1.8V [ST] PD Fan PWM

208 GPIO08 Input CMOS – 1.8V [CZ] PD Fan Tachometer

4.7.6 Debug
A debug interface is supported via JTAG on-module test points or serial interface over UART1. The JTAG interface can be
used for SCAN testing or communicating with integrated CPU. See the NVIDIA Jetson Nano Product Design Guide for more
information.

Table 25 Debug Pin Descriptions

Pin Name I/O Pin Type PoR Description

- JTAG_RTCK Output CMOS – 1.8V [JT_RST] 0 Return Test Clock


- JTAG_TCK Input CMOS – 1.8V [JT_RST] z Test Clock

- JTAG_TDI Input CMOS – 1.8V [JT_RST] PU Test Data In

- JTAG_TDO Output CMOS – 1.8V [ST] z Test Data Out

- JTAG_TMS Input CMOS – 1.8V [JT_RST] PU Test Mode Select

- JTAG_GP0 Input CMOS – 1.8V [JT_RST] PD Test Reset

236 UART2_TXD Output CMOS – 1.8V [ST] PD Debug UART Transmit

238 UART2_RXD Input CMOS – 1.8V [ST] PD Debug UART Receive

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 31
252

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

5.0 Physical / Electrical Characteristics

5.1 Operating and Absolute Maximum Ratings


The parameters listed in following table are specific to a temperature range and operating voltage. Operating the Jetson Nano
module beyond these parameters is not recommended. Exceeding these conditions for extended periods may adversely affect
device reliability.

WARNING: Exceeding the listed conditions may damage and/or affect long-term reliability of the part.
The Jetson Nano module should never be subjected to conditions extending beyond the
ratings listed below.

Table 26 Recommended Operating Conditions


Symbol Parameter Minimum Typical Maximum Unit

VDDDC VDD_IN 4.75 5.0 5.25 V

PMIC_BBAT 1.65 5.5 V

Absolute maximum ratings describe stress conditions. These parameters do not set minimum and maximum operating
conditions that will be tolerated over extended periods of time. If the device is exposed to these parameters for extended
periods of time, no guarantee is made and device reliability may be affected. It is not recommended to operate the Jetson
Nano module under these conditions.

Table 27 Absolute Maximum Ratings


Symbol Parameter Minimum Maximum Unit Notes

VDDMAX VDD_IN -0.5 5.5 V

PMIC_BBAT -0.3 6.0 V

IDDMAX VDD_IN Imax 5 A

V M_PIN Voltage applied to any powered -0.5 VDD + 0.5 V VDD + 0.5V when CARRIER_PWR_ON high &
I/O pin associated I/O rail powered.
I/O pins cannot be high (>0.5V) before
CARRIER_PWR_ON goes high.
When CARRIER_PWR_ON is low, the maximum
voltage applied to any I/O pin is 0.5V
DD pins configured as open -0.5 3.63 V The pin’s output-driver must be set to open-drain
drain mode
TOP Operating Temperature -25 97 °C See the Jetson Nano Thermal Design Guide for
details.
TSTG Storage Temperature (ambient) -40 80 °C

MMAX Mounting Force 4.0 kgf kilogram-force (kgf). Maximum force applied to
PCB. See the Jetson Nano Thermal Design
Guide for additional details on mounting a thermal
solution.

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 32
Chapter 21. Data Sheets 253

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

5.2 Digital Logic


Voltages less than the minimum stated value can be interpreted as an undefined state or logic level low which may result in
unreliable operation. Voltages exceeding the maximum value can damage and/or adversely affect device reliability.

Table 28. CMOS Pin Type DC Characteristics

Symbol Description Minimum Maximum Units

V IL Input Low Voltage -0.5 0.25 x VDD V

V IH Input High Voltage 0.75 x VDD 0.5 + VDD V

V OL Output Low Voltage (IOL = 1mA) --- 0.15 x VDD V


V OH Output High Voltage (IOH = -1mA) 0.85 x VDD --- V

Table 29 Open Drain Pin Type DC Characteristics

Symbol Description Minimum Maximum Units

V IL Input Low Voltage -0.5 0.25 x VDD V

V IH Input High Voltage 0.75 x VDD 3.63 V

V OL Output Low Voltage (IOL = 1mA) --- 0.15 x VDD V

I2C[1,0] Output Low Voltage (IOL = 2mA) (see note) --- 0.3 x VDD V

V OH Output High Voltage (IOH = -1mA) 0.85 x VDD --- V


Note: I2C[1,0]_[SCL, SDA] pins pull-up to 3.3V through on module 2.2kΩ resistor. I2C2_[SCL, SDA] pins pull-up to 1.8V through on module 2.2kΩ resistor.

5.3 Environmental & Mechanical Screening


Module performance was assessed against a series of industry standard tests designed to evaluate robustness and estimate
the failure rate of an electronic assembly in the environment in which it will be used. Mean Time Between Failures (MTBF)
calculations are produced in the design phase to predict a product’s future reliability in the field.

Table 30 Jetson Nano Reliability Report


Test Reference Standards / Test Conditions
Temperature Humidity Biased JESD22-A101
85°C / 85% RH, 168 hours, Power ON
Temperature Cycling JESD22-A104, IPC9701
-40°C to 105°C, 250 cycles, non-operational
Humidity Steady State NVIDIA Standard
45°C 90% RH 336hrs, operational
Mechanical Shock – 140G JESD22-B110
140G, half sine, 1 shock/orientation, 6 orientations total, non-operational
Mechanical Shock – 50G IEC600068-2-27
50G, half sine,1 shock/orientation, 6 orientations total, operational
Connector Insertion Cycling EIA-364
400-pin connector, 30 cycles
Sine Vibration – 3G IEC60068-2-6
3G, 10-500 Hz, 1 sweep/axis, 3 axes total, non-operational
Random Vibration – 2G IEC60068-2-64
10-500 Hz, 2 Grms,1 hour/axis, non-operational

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 33
254

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Test Reference Standards / Test Conditions


Random Vibration – 1G IEC60068-2-64
10-500 Hz, 1 Grms,1 hour/axis, operational
Hard Boot NVIDIA Standard
Power ON/OFF, ON for 150 sec OFF for 30 sec 1000 cycles at 25°C, 1000 cycles at -40°C
Operational Low Temp NVIDIA Standard
-5°C, 24 hours, operational
Operational High Temp NVIDIA Standard
40°C, 90%RH, 168 hours, operational
MTBF / Failure Rate: 3,371K Hours Telcordia SR-332, ISSUE 3 Parts Count (Method I)
Controlled Environment (GB), T = 35°C, CL = 90%
MTBF / Failure Rate: 1,836K Hours Telcordia SR-332, ISSUE 3 Parts Count (Method I)
Uncontrolled Environment (GF), T = 35°C, CL = 90%
MTBF / Failure Rate: 957K Hours Telcordia SR-332, ISSUE 3 Parts Count (Method I)
Uncontrolled Environment (GM), T = 35°C, CL = 90%

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 34
Chapter 21. Data Sheets 255

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

5.4 Pinout
Signal Name Pin # Pin # Signal Name Signal Name Pin # Pin # Signal Name
Top Bottom Top Bottom
Odd Even Odd Even
GND 1 2 GND PCIE0 RX0 P 133 134 PCIE0 TX0 N
CSI1_D0_N 3 4 CSI0_D0_N GND 135 136 PCIE0_TX0_P
CSI1_D0_P 5 6 CSI0_D0_P PCIE0_RX1_N 137 138 GND
GND 7 8 GND PCIE0 RX1 P 139 140 PCIE0 TX1 N
RSVD 9 10 CSI0 CLK N GND 141 142 PCIE0 TX1 P
RSVD 11 12 CSI0_CLK_P RSVD 143 144 GND
GND 13 14 GND KEY KEY KEY KEY
CSI1 D1 N 15 16 CSI0 D1 N RSVD 145 146 GND
CSI1 D1 P 17 18 CSI0 D1 P GND 147 148 PCIE0 TX2 N
GND 19 20 GND PCIE0_RX2_N 149 150 PCIE0_TX2_P
CSI3_D0_N 21 22 CSI2_D0_N PCIE0_RX2_P 151 152 GND
CSI3 D0 P 23 24 CSI2 D0 P GND 153 154 PCIE0 TX3 N
GND 25 26 GND PCIE0 RX3 N 155 156 PCIE0 TX3 P
CSI3_CLK_N 27 28 CSI2_CLK_N PCIE0_RX3_P 157 158 GND
CSI3 CLK P 29 30 CSI2 CLK P GND 159 160 PCIE0 CLK N
GND 31 32 GND USBSS RX N 161 162 PCIE0 CLK P
CSI3 D1 N 33 34 CSI2 D1 N USBSS RX P 163 164 GND
CSI3_D1_P 35 36 CSI2_D1_P GND 165 166 USBSS_TX_N
GND 37 38 GND RSVD 167 168 USBSS TX P
DP0 TXD0 N 39 40 CSI4 D2 N RSVD 169 170 GND
DP0_TXD0_P 41 42 CSI4_D2_P GND 171 172 RSVD
GND 43 44 GND RSVD 173 174 RSVD
DP0 TXD1 N 45 46 CSI4 D0 N RSVD 175 176 GND
DP0 TXD1 P 47 48 CSI4 D0 P GND 177 178 MOD SLEEP*
GND 49 50 GND PCIE_WAKE* 179 180 PCIE0_CLKREQ*
DP0_TXD2_N 51 52 CSI4_CLK_N PCIE0_RST* 181 182 RSVD
DP0 TXD2 P 53 54 CSI4 CLK P RSVD 183 184 GBE MDI0 N
GND 55 56 GND I2C0 SCL 185 186 GBE MDI0 P
DP0_TXD3_N 57 58 CSI4_D1_N I2C0_SDA 187 188 GBE_LED_LINK
DP0_TXD3_P 59 60 CSI4_D1_P I2C1_SCL 189 190 GBE_MDI1_N
GND 61 62 GND I2C1 SDA 191 192 GBE MDI1 P
DP1_TXD0_N 63 64 CSI4 D3 N I2S0_DOUT 193 194 GBE LED ACT
DP1_TXD0_P 65 66 CSI4_D3_P I2S0_DIN 195 196 GBE_MDI2_N
GND 67 68 GND I2S0 FS 197 198 GBE MDI2 P
DP1 TXD1 N 69 70 DSI D0 N I2S0 SCLK 199 200 GND
DP1_TXD1_P 71 72 DSI_D0_P GND 201 202 GBE_MDI3_N
GND 73 74 GND UART1_TXD 203 204 GBE_MDI3_P
DP1 TXD2 N 75 76 DSI CLK N UART1 RXD 205 206 GPIO07
DP1 TXD2 P 77 78 DSI CLK P UART1 RTS* 207 208 GPIO08
GND 79 80 GND UART1_CTS* 209 210 CLK_32K_OUT
DP1_TXD3_N 81 82 DSI_D1_N GPIO09 211 212 GPIO10
DP1 TXD3 P 83 84 DSI D1 P CAM I2C SCL 213 214 FORCE RECOVERY*
GND 85 86 GND CAM I2C SDA 215 216 GPIO11
GPIO0 87 88 DP0_HPD GND 217 218 GPIO12
SPI0_MOSI 89 90 DP0_AUX_N SDMMC_DAT0 219 220 I2S1_DOUT
SPI0 SCK 91 92 DP0 AUX P SDMMC DAT1 221 222 I2S1 DIN
SPI0_MISO 93 94 HDMI CEC SDMMC DAT2 223 224 I2S1 FS
SPI0_CS0* 95 96 DP1_HPD SDMMC_DAT3 225 226 I2S1_SCLK
SPI0_CS1* 97 98 DP1_AUX_N SDMMC_CMD 227 228 GPIO13
UART0 TXD 99 100 DP1 AUX P SDMMC CLK 229 230 GPIO14
UART0_RXD 101 102 GND GND 231 232 I2C2 SCL
UART0_RTS* 103 104 SPI1_MOSI SHUTDOWN_REQ* 233 234 I2C2_SDA
UART0 CTS* 105 106 SPI1 SCK PMIC BBAT 235 236 UART2 TXD
GND 107 108 SPI1 MISO POWER EN 237 238 UART2 RXD
USB0_D_N 109 110 SPI1_CS0* SYS_RESET* 239 240 SLEEP/WAKE*
USB0_D_P 111 112 SPI1_CS1* GND 241 242 GND
GND 113 114 CAM0 PWDN GND 243 244 GND
USB1 D N 115 116 CAM0 MCLK GND 245 246 GND
USB1_D_P 117 118 GPIO01 GND 247 248 GND
GND 119 120 CAM1_PWDN GND 249 250 GND
USB2 D N 121 122 CAM1 MCLK VDD IN 251 252 VDD IN
USB2 D P 123 124 GPIO02 VDD IN 253 254 VDD IN
GND 125 126 GPIO03 VDD_IN 255 256 VDD_IN
GPIO04 127 128 GPIO05 VDD_IN 257 258 VDD_IN
GND 129 130 GPIO06 VDD IN 259 260 VDD IN
PCIE0_RX0_N 131 132 GND

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 35
256

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

5.5 Package Drawing and Dimensions

Figure 7 Module Top and Side View with Cover Outline

Figure 8 Module Bottom with Cover Outline

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 36
Chapter 21. Data Sheets 257

Jetson Nano System-on-Module


Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC

Figure 9 Module Top Showing DRAM Placement and Side View

Figure 10 Module Bottom Showing EMMC Placement

JETSON | NANO | DATASHEET | DA-09366-001_v1.0 | SUBJECT TO CHANGE | COPYRIGHT © 2014 – 2020 NVIDIA CORPORATION. ALL RIGHTS RESERVED. 37
258

Notice

The information provided in this specification is believed to be accurate and reliable as of the date provided. However, NVIDIA Corporation
(“NVIDIA”) does not give any representations or warranties, expressed or implied, as to the accuracy or completeness of such information.
NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third
parties that may result from its use. This publication supersedes and replaces all other specifications for the product that may have been
previously supplied.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and other changes to this specification, at any
time and/or to discontinue any product or service without notice. Customer should obtain the latest relevant specification before placing
orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement,
unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer. NVIDIA hereby
expressly objects to applying any customer general terms and conditions with regard to the purchase of the NVIDIA product referenced in
this specification.

NVIDIA products are not designed, authorized or warranted to be suitable for use in medical, military, aircraft, space or life support
equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury,
death or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or
applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on these specifications will be suitable for any specified use without
further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole
responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the
application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality
and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this
specification. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable
to: (i) the use of the NVIDIA product in any manner that is contrary to this specification, or (ii) customer product designs.

No license, either express or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under
this specification. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to
use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under
the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property
rights of NVIDIA. Reproduction of information in this specification is permissible only if reproduction is approved by NVIDIA in writing, is
reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL
IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards
customer for the products described herein shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.
HDMI

HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.

ARM

ARM, AMBA and ARM Powered are registered trademarks of ARM Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All
other brands or product names are the property of their respective holders. ʺARMʺ is used to represent ARM Holdings plc; its operating
company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea Limited.; ARM Taiwan Limited; ARM France SAS;
ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt. Ltd.; ARM Norway, AS and ARM Sweden
AB

OpenCL

OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.

Trademarks

NVIDIA, the NVIDIA logo, and Tegra are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries.
Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright

© 2014 – 2020 NVIDIA Corporation. All rights reserved

NVIDIA Corporation | 2788 San Tomas Expressway | Santa Clara, CA 95051 | www.nvidia.com
Chapter 21. 259

21.3. Jetson Nano Developer Kit - User Guide

JETSON NANO
DEVELOPER KIT

DA_09402_004 | January 15, 2020

User Guide
260

DOCUMENT CHANGE HISTORY

DA_09402_004

Version Date Authors Description of Change


1.0 March 18, 2019 plawrence Initial release
Updates concurrent with Jetson Linux
Driver Package release 32.2. Added
and corrected carrier board
1.1 July 8, 2019 plawrence information; added information about
NVIDIA Container Runtime; added
information about first boot
configuration process.
Updates concurrent with Jetson Linux
1.2 January 7, 2020 ssheshadri Driver Package release 32.3.1. Added
references to VPI.
1.3 January 15, 2020 ssheshadri Documentation added for rev B01.

NOTE
Welcome to the NVIDIA Jetson platform! There two key things you should do right
away:
1. Sign up for the NVIDIA Developer Program – this enables you to ask
questions and contribute on the NVIDIA Jetson Forums, gives access to all
documentation and collateral on the Jetson Download Center, and more.
2. Read this User Guide! After that, check out these important links:
• Jetson FAQ – Please read the FAQ.
• Support Resources – This web page links to important resources, including
the Jetson Forum and the Jetson Ecosystem page.
• NVIDIA Jetson Linux Driver Package Developer Guide – Jetson Linux
Driver Package is a key component of the Jetson platform, and provides
the sample filesystem for your developer kit. Comprehensive
documentation may be found in the Developer Guide.
Thanks,
The NVIDIA Jetson team

Jetson Nano Developer Kit DA_09402_004 | ii


Chapter 21. 261

TABLE OF CONTENTS

Note ........................................................................................... ii

Developer Kit Setup and Hardware..................................................... 1


Included in the Box ............................................................................. 2
Developer Kit Setup ............................................................................ 2
Developer Kit Interfaces ....................................................................... 2
Interface Details ............................................................................. 3
Power Guide .................................................................................. 6

JetPack ....................................................................................... 7
Summary of JetPack Components ............................................................ 7
How to Install JetPack ......................................................................... 9
Initial Configuration upon First Boot ........................................................ 10

Working with Jetson Linux Driver Package ......................................... 11

Compliance Information ................................................................ 12


United States ................................................................................... 12
Canada........................................................................................... 13
Innovation, Science and Economic Development Canada (ISED) .................. 13
European Union ................................................................................ 14
Australia and New Zealand ................................................................... 14
Japan ............................................................................................ 14
South Korea ..................................................................................... 17
Russia/Kazakhstan/Belarus ................................................................... 19
Taiwan ........................................................................................... 19
China ............................................................................................. 20
India.............................................................................................. 22

Jetson Nano Developer Kit DA_09402_004 | iii


262

DEVELOPER KIT SETUP AND HARDWARE

The NVIDIA® Jetson Nano™ Developer Kit is an AI computer for makers, learners, and
developers that brings the power of modern artificial intelligence to a low-power, easy-
to-use platform. Get started quickly with out-of-the-box support for many popular
peripherals, add-ons, and ready-to-use projects.

A Jetson Nano Developer Kit includes a non-production specification Jetson module


(P3448-0000) attached to a reference carrier board (P3449-0000). This user guide covers
two revisions of the developer kit:
• The latest Jetson Nano Developer Kit (part number 945-13450-0000-100), which
includes carrier board revision B01.
• The original Jetson Nano Developer Kit (part number 945-13450-0000-000), which
includes carrier board revision A02.

Your carrier board revision is the last three characters of the 180-level part number,
which is printed the underside of the carrier board. Your Jetson Nano Developer Kit part
number is printed on the developer kit box.

Note The B01 revision carrier board is compatible with the production
specification Jetson Nano module. The A02 revision carrier board is
not.
Both revisions of the carrier board are described in this user guide.

Jetson Nano is supported by the comprehensive NVIDIA® JetPack™ SDK, and has the
performance and capabilities needed to run modern AI workloads. JetPack includes:
• Full desktop Linux with NVIDIA drivers
• AI and Computer Vision libraries and APIs
• Developer tools
• Documentation and sample code

Jetson Nano Developer Kit DA_09402_004 | 1


Chapter 21. 263

INCLUDED IN THE BOX


• A Jetson non-production Jetson Nano module (P3448-0000) with heatsink
• A reference carrier board (P3449-0000)
• A small paper card with quick start and support information
• A folded paper stand for the developer kit

DEVELOPER KIT SETUP


Before using your developer kit, you need to set up a microSD card with the operating
system and JetPack components. The simplest method is to download the microSD card
image and follow instructions found in Getting Started with Jetson Nano Developer Kit.

In summary:
• You need a 16 GB or larger UHS-1 microSD card, HDMI or DP monitor, USB
keyboard and mouse, and 5V⎓2A Micro-USB power supply.
• Download the image and write it to the microSD card.
• Insert the microSD card into the slot under the Jetson Nano module, then attach the
display, keyboard, mouse, and Ethernet cable or wireless networking adapter.
• Connect the Micro-USB power supply. The developer kit powers on automatically.

For alternative methods, see How to Install JetPack, below.

DEVELOPER KIT INTERFACES


Developer kit module and carrier boards: front and rear views

Jetson Nano Developer Kit DA_09402_004 | 2


264

Developer kit carrier boards: rev A02 top view

Developer kit module and carrier board: rev B01 top view

Interface Details
This section highlights some of the Jetson Nano Developer Kit interfaces. See the Jetson
Nano Developer Kit Carrier Board Specification for comprehensive information.

Module
• [J501] Slot for a microSD card.

Jetson Nano Developer Kit DA_09402_004 | 3


Chapter 21. 265

• The passive heatsink supports 10W module power usage at 25° C ambient
temperature. If your use case requires additional cooling, you can configure the
module to control a system fan. See the Jetson Nano Supported Component List for
fans that have been verified for attachment to the heatsink.

Carrier Board
• [DS3] Power LED; lights when the developer kit is powered on.
• [J2] SO-DIMM connector for Jetson Nano module.
• [J6] HDMI and DP connector stack.
• [J13] Camera connector; enables use of CSI cameras. Jetson Nano Developer Kit
works with IMX219 camera modules, including Leopard Imaging LI-IMX219-MIPI-
FF-NANO camera module and Raspberry Pi Camera Module V2.
• [J15] 4-pin fan control header. Pulse Width Modulation (PWM) output and
tachometer input are supported.
• [J18] M.2 Key E connector can be used for wireless networking cards; includes
interfaces for PCIe (x1), USB 2.0, UART, I2S, and I2C.
To reach J18 you must detach the Jetson Nano module.
• [J25] Power jack for 5V⎓4A power supply. (The maximum supported continuous
current is 4.4A.) Accepts a 2.1×5.5×9.5 mm plug with positive polarity.
• [J28] Micro-USB 2.0 connector; can be used in either of two ways:
• If J48 pins are not connected, you can power the developer kit from a 5V⎓2A
Micro-USB power supply.
• If J48 pins are connected, operates in Device Mode.
• [J32 and J33] are each a stack of two USB 3.0 Type A connectors. Each stack is limited
to 1A total power delivery. All four are connected to the Jetson Nano module via a
USB 3.0 hub built into the carrier board.
• [J38] The Power over Ethernet (POE) header exposes any DC voltage present on J43
Ethernet jack per IEEE 802.3af.
• [J40] Carrier board rev A02 only: 8-pin button header; brings out several system
power, reset, and force recovery related signals (see the following diagram).

• Pins 7 and 8 disable auto power-on.


• Pins 1 and 2 initiate power-on if auto power-on is disabled.
• Pins 5 and 6 reset the system.

Jetson Nano Developer Kit DA_09402_004 | 4


266

• Pins 3 and 4 put the developer kit into Force Recovery Mode if they are
connected when it is powered on.
• [J41] 40-pin expansion header includes:
• Power pins.
Two 3.3V power pins and two 5V power pins. These are not switchable; power is
always available when the developer kit is connected to power.
Two 5V pins can be used to power the developer kit at 2.5A each.
• Interface signal pins.
All signals use 3.3V levels.
By default, all interface signal pins are configured as GPIOs, except pins 3 and 5
and pins 27 and 28, which are I2C SDA and SCL, and pins 8 and 10, which are
UART TX and RX. L4T includes a Python library, Jetson.GPIO, for controlling
GPIOs. See /opt/nvidia/jetson-gpio/doc/README.txt on your Jetson
system for details.
L4T includes the jetson-io utility to configure pins for SFIOs. See
“Configuring the 40-pin Expansion Header” in the L4T Development Guide for
more information.
• [J43] RJ45 connector for gigabit Ethernet.
• [J44] Carrier board rev A02 only: 3.3V serial port header; provides access to the
UART console.
• [J48] Enables either J28 Micro-USB connector or J25 power jack as power source for
the developer kit. Without a jumper, the developer kit can be powered by J28 Micro-
USB connector. With a jumper, no power is drawn from J28, and the developer kit
can be powered via J25 power jack.
• [J49] Carrier board rev B01 only: Camera connector; same as [J13].
• [J50] Carrier Board Rev B01 only: 12-pin button header; brings out system power,
reset, UART console, and force recovery related signals:

• Pin 1 connects to LED Cathode to indicate System Sleep/Wake (Off when system
is in sleep mode).
• Pin 2 connects to LED Anode.
• Pins 3 and 4 are respectively UART Receive and Send.
• Pins 5 and 6 disable auto power-on if connected.
• Pins 7 and 8 reset the system if connected when the system is running.
• Pins 9 and 10 put the developer kit into Force Recovery Mode if they are
connected when it is powered on.
• Pins 11 and 12 initiate power-on when connected if auto power-on is disabled.

Jetson Nano Developer Kit DA_09402_004 | 5


Chapter 21. 267

Power Guide
Jetson Nano Developer Kit requires a 5V power supply capable of supplying 2A current.

Micro-USB Power Supply Options


Out of the box, the developer kit is configured to accept power via the Micro-USB
connector. Note that some Micro-USB power supplies are designed to output slightly
more than 5V to account for voltage loss across the cable. For example, Adafruit’s
GEO151UB-6025 Power Supply (validated by NVIDIA for use with the Jetson Nano
Developer Kit) is designed to provide 5.25V. The critical point is that the Jetson Nano
module requires a minimum of 4.75V to operate. Use a power supply capable of
delivering 5V at the J28 Micro-USB connector.

Other Power Supply Options


If the developer kit’s total load is expected to exceed 2A, e.g., due to peripherals
attached to the carrier board, connect the J48 Power Select Header pins to disable power
supply via Micro-USB and enable 5V⎓4A via the J25 power jack. Another option is to
supply 5V⎓5A via the J41 expansion header (2.5A per pin).

The J25 power jack is 9.5 mm deep, and accepts positive polarity plugs with 2.1 mm
inner diameter and 5.5 mm outer diameter. As an example, NVIDIA has validated
Adafruit’s GEO241DA-0540 Power Supply for use with Jetson Nano Developer Kit.

Power Budget Considerations


The developer kit’s total power usage is the sum of carrier board, module, and
peripheral power usage, as determined by your particular use case.

The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals
attached.

The Jetson Nano module is designed to optimize power efficiency and supports two
software-defined power modes. The default mode provides a 10W power budget for the
modules, and the other, a 5W budget. These power modes constrain the module to near
their 10W or 5W budgets by capping the GPU and CPU frequencies and the number of
online CPU cores at a pre-qualified level. See the NVIDIA Jetson Linux Driver Package
Developer Guide for details about power modes.

Note that the power mode budgets cover the two major power domains for the Jetson
Nano module: GPU (VDD_GPU) and CPU (VDD_CPU). Individual parts of the CORE
(VDD_CORE) power domain, such as video encode and video decode, are not covered
by these budgets. This is a reason why power modes constrain the module to near a
power budget, but not to the exact power budget. Your particular use case determines
the module’s actual power consumption. See the Jetson Nano module Data Sheet for details
about how power domains are used to optimize power consumption.

Attached peripherals are the final component of the developer kit’s total power usage.
Select a power supply that is capable of delivering sufficient power for your workload.

Jetson Nano Developer Kit DA_09402_004 | 6


268

JETPACK

NVIDIA JetPack SDK is the most comprehensive solution for building AI applications. It
includes the latest OS images for Jetson products, along with libraries and APIs,
samples, developer tools, and documentation.

SUMMARY OF JETPACK COMPONENTS


This section briefly describes each component of JetPack. For additional details about
these components, see the online documentation for JetPack at:

https://docs.nvidia.com/jetson/jetpack/index.html

OS Image
JetPack includes a reference file system derived from Ubuntu.

Libraries and APIs


JetPack libraries and APIs include:
• TensorRT and cuDNN for high-performance deep learning applications
• CUDA for GPU accelerated applications across multiple domains
• NVIDIA Container Runtime for containerized GPU accelerated applications
• Multimedia API package for camera applications and sensor driver development
• VisionWorks, OpenCV, and VPI (Developer Preview) for visual computing
applications
• Sample applications

Jetson Nano Developer Kit DA_09402_004 | 7


Chapter 21. 269

Sample Applications
JetPack includes several samples which demonstrate the use of JetPack components.
These are stored in the reference filesystem and can be compiled on the developer kit.

JetPack component Sample locations on reference filesystem


TensorRT /usr/src/tensorrt/samples/
cuDNN /usr/src/cudnn_samples_<version>/
CUDA /usr/local/cuda-<version>/samples/
Multimedia API /usr/src/tegra_multimedia_api/
/usr/share/visionworks/sources/samples/
VisionWorks /usr/share/visionworks-tracking/sources/samples/
/usr/share/visionworks-sfm/sources/samples/
OpenCV /usr/share/OpenCV/samples/
VPI /opt/nvidia/vpi/vpi-0.0/samples

Developer Tools
JetPack includes the following developer tools. Some are used directly on a Jetson
system, and others run on a Linux host computer connected to a Jetson system.
• Tools for application development and debugging:
• Nsight Eclipse Edition for development of GPU accelerated applications: Runs
on Linux host computer. Supports all Jetson products.
• CUDA-GDB for application debugging: Runs on the Jetson system or the Linux
host computer. Supports all Jetson products.
• CUDA-MEMCHECK for debugging application memory errors: Runs on the
Jetson system. Supports all Jetson products.
• Tools for application profiling and optimization:
• Nsight Systems for application profiling across GPU and CPU: Runs on the
Linux host computer. Supports all Jetson products.
• nvprof for application profiling across GPU and CPU: Runs on the Jetson system.
Supports all Jetson products.
• Visual Profiler for application profiling across GPU and CPU: Runs on Linux
host computer. Supports all Jetson products.
• Nsight Graphics for graphics application debugging and profiling: Runs on the
Linux host computer. Supports all Jetson products.

Documentation
Documents that are relevant to developers using JetPack include:
• JetPack Documentation • Multimedia API Reference

Jetson Nano Developer Kit DA_09402_004 | 8


270

• NVIDIA Jetson Linux Driver Package • VisionWorks Documentation


Developer Guide • Nsight Eclipse Edition Documentation
• Tegra NVIDIA Jetson Linux Driver • CUDA-GDB Documentation
Package Release Notes • CUDA-MEMCHECK Documentation
• TensorRT Documentation • Nsight Systems
• cuDNN Documentation • nvprof
• CUDA Toolkit • Visual Profiler
• NVIDIA Container Runtime • Nsight GraphicsNsight Compute CLI
• OpenCV Documentation • VPI–Vision Programming Interface

HOW TO INSTALL JETPACK


There are two ways to install JetPack on your developer kit:
• Use an SD Card image.
Follow the steps in Getting Started with Jetson Nano Developer Kit to download the
system image and use SD Card writing software to flash it to flash it to a microSD
card. Then use the microSD card to boot the developer kit.
• Use NVIDIA SDK Manager.
You must have a Linux host computer with a working Internet connection to run
SDK Manager and flash the developer kit. Supported host operating systems are:
• Ubuntu Linux x64 Version 18.04 or 16.04
Follow these instructions to download and install NVIDIA SDK Manager.

Note Use of SDK Manager to install JetPack requires that:


• The developer kit be in Force Recovery mode.
• The developer kit not be powered by a Micro-USB power supply.
The Micro-USB port is needed to flash and update the developer kit.

Before using SDK Manager, follow these steps to power your developer kit and put it
into Force Recovery mode:
1. Jumper the Force Recovery pins (3 and 4) on J40 button header
2. Jumper the J48 Power Select Header pins and connect a power supply to J25 power
jack. The developer kit automatically powers on in Force Recovery mode.
3. Remove the Force Recovery pins’ jumper when the developer kit powers on.

Now use SDK Manager to flash your developer kit with the OS image and install other
Jetpack components. SDK Manager can also set up your Linux host computer
development environment. For full instructions, see the SDK Manager documentation.

Jetson Nano Developer Kit DA_09402_004 | 9


Chapter 21. 271

INITIAL CONFIGURATION UPON FIRST BOOT


Whether you use the SD Card image or use SDK Manager to flash your developer kit,
upon first boot it will prompt you for initial configuration information like keyboard
layout, username and password, etc.

Headless initial configuration


If no display is attached to the developer kit during first boot, the initial configuration
process is “headless.” That is, you must communicate with the developer kit through a
serial application on the host computer (e.g., puTTY) connected via a host serial port and
the developer kit’s Micro-USB port.

Note Headless initial configuration requires that the developer kit not be
powered by a Micro-USB power supply, since the Micro-USB port is
needed to access the initial configuration prompts.

Jetson Nano Developer Kit DA_09402_004 | 10


272

WORKING WITH JETSON LINUX DRIVER


PACKAGE

NVIDIA® Jetson™ Linux Driver Package (L4T) is the operating system component of
JetPack, and provides the Linux kernel, Bootloader, Jetson Board Support Package (BSP),
and sample filesystem for Jetson developer kits. L4T is included on the Jetson Nano SD
Card image. Alternatively, you can use SDK Manager to install L4T along with all the
other JetPack components to your developer kit.

L4T is also available for download directly from the main L4T page on the Jetson
Developer Site. See the “Quick Start Guide” section of the NVIDIA Jetson Linux Driver
Package Developer Guide for flashing instructions.

The “Platform Adaptation and Bring-Up” topic in the Developer Guide describes how to
port the Jetson BSP and bootloader from your developer kit to a new hardware platform
incorporating the Jetson module. Porting L4T to a new device enables use of the other
JetPack components on that device, along with the software you’ve created using the
developer kit.

Jetson Nano Developer Kit DA_09402_004 | 11


Chapter 21. 273

COMPLIANCE INFORMATION

The NVIDIA Jetson Nano Developer Kit is compliant with the regulations listed in this
section.

UNITED STATES
Federal Communications Commission (FCC)

This device complies with part 15 of the FCC Rules. Operation is subject to the following
two conditions: (1) this device may not cause harmful interference, and (2) this device
must accept any interference received, including any interference that may cause
undesired operation of the device.

This equipment has been tested and found to comply with the limits for a Class B digital
device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide
reasonable protection against harmful interference in a residential installation. This
equipment generates uses and can radiate radio frequency energy and, if not installed
and used in accordance with the instructions, may cause harmful interference to radio
communications. However, there is no guarantee that interference will not occur in a
particular installation.

If this equipment does cause harmful interference to radio or television reception, which
can be determined by turning the equipment off and on, the user is encouraged to try to
correct the interference by one or more of the following measures:

Jetson Nano Developer Kit DA_09402_004 | 12


274

 Reorient or relocate the receiving antenna.


 Increase the separation between the equipment and receiver.
 Connect the equipment into an outlet on a circuit different from that to which
the receiver is connected.
 Consult the dealer or an experienced radio/TV technician for help.

FCC Warning: The FCC requires that you be notified that any changes or modifications
to this device not expressly approved by the manufacturer could void the user’s
authority to operate the equipment.

Underwriters Laboratories (UL)


UL listed Product Logo for Jetson Nano Developer Kit, model name P3450.

UL Recognized Component Logo for Embedded System Module for Jetson Nano, model
number P3448

CANADA
Innovation, Science and Economic Development Canada (ISED)
CAN ICES-3(B)/NMB-3(B)
This device complies with Industry Canada license-exempt RSS standard(s). Operation
is subject to the following two conditions: (1) this device may not cause interference, and
(2) this device must accept any interference, including interference that may cause
undesired operation of the device.
Le présent appareil est conforme aux CNR d'Industrie Canada applicables aux appareils
radio exempts de licence. L'exploitation est autorisée aux deux conditions suivantes : (1)
l'appareil ne doit pas produire de brouillage, et (2) l'utilisateur de l'appareil doit accepter
tout brouillage radioélectrique subi, même si le brouillage est susceptible d'en
compromettre le fonctionnement.

Jetson Nano Developer Kit DA_09402_004 | 13


Chapter 21. 275

EUROPEAN UNION
European Conformity; Conformité Européenne (CE)

This device complies with the following Directives:


 Electromagnetic Compatibility Directive 2014/30/EU
 Low Voltage Directive 2014/35/EU
 RoHS Directive 2011/65/EU

The full text of EU declaration of conformity is available at the following internet


address: www.nvidia.com/support

A copy of the Declaration of Conformity to the essential requirements may be obtained


directly from NVIDIA GmbH (Floessergasse 2, 81369 Munich, Germany).

AUSTRALIA AND NEW ZEALAND


Australian Communications and Media Authority

This product meets the applicable EMC requirements for Class B, I.T.E equipment and applicable
radio equipment requirements

JAPAN
Voluntary Control Council for Interference (VCCI)

Jetson Nano Developer Kit DA_09402_004 | 14


276

Japan RoHS Material Content Declaration

日本工業規格JIS C 0950:2008により、2006年7月1日以降に販売される特定分野の電気および
電子機器について、製造者による含有物質の表示が義務付けられます。

機器名称: Jetson Nano Developer Kit 開発者コンポーネント

特定化学物質記号
主な分類
Pb Hg Cd Cr(VI) PBB PBDE

PCBボード 0 0 0 0 0 0

除外項
パッシブ電子部品 0 0 0 0 0

除外項
アクティブ電子部品 0 0 0 0 0

除外項
コネクター / ケーブル 0 0 0 0 0

プロセッサー 0 0 0 0 0 0

メモリ 0 0 0 0 0 0

除外項
機械部品 0 0 0 0 0

はんだ付け材料 0 0 0 0 0 0

フラックス、クリームはん
0 0 0 0 0 0
だ、ラベル、その他消耗品
注:

1.「0」は、特定化学物質の含有率が日本工業規格JIS C 0950:2008に記載されている含有率基
準値より低いことを示します。

Jetson Nano Developer Kit DA_09402_004 | 15


Chapter 21. 277

2.「除外項目」は、特定化学物質が含有マークの除外項目に該当するため、特定化学物質につ
いて、日本工業規格JIS C 0950:2008に基づく含有マークの表示が不要であることを示しま
す。

3.「0.1wt%超」または「0.01wt%超」は、特定化学物質の含有率が日本工業規格JIS C
0950:2008 に記載されている含有率基準値を超えていることを示します。

A Japanese regulatory requirement, defined by specification JIS C 0950: 2008, mandates that
manufacturers provide Material Content Declarations for certain categories of electronic
products offered for sale after July 1, 2006.

Product Model Number: Jetson Nano Developer Kit

Symbols of Specified Chemical Substance


Major Classification
Pb Hg Cd Cr(VI) PBB PBDE

PCB 0 0 0 0 0 0

Passive components Exempt 0 0 0 0 0

Active components Exempt 0 0 0 0 0

Connectors/Cables Exempt 0 0 0 0 0

Processor 0 0 0 0 0 0

Memory 0 0 0 0 0 0

Mechanicals Exempt 0 0 0 0 0

Soldering material 0 0 0 0 0 0

Flux, Solder Paste, label and


0 0 0 0 0 0
other consumable materials
Notes:

1. “0” indicates that the level of the specified chemical substance is less than the threshold
level specified in the standard, JIS C 0950: 2008.

2. “Exempt” indicates that the specified chemical substance is exempt from marking and it is

Jetson Nano Developer Kit DA_09402_004 | 16


278

not required to display the marking for that specified chemical substance per the standard, JIS
C 0950: 2008.

3. “Exceeding 0.1wt%” or “Exceeding 0.01wt%” is entered in the table if the level of the
specified chemical substance exceeds the threshold level specified in the standard, JIS C 0950:
2008.

SOUTH KOREA
Radio Research Agency (RRA)

R-R-NVA-P3450 R-R-NVA-P3448
B급 기기 이 기기는 가정용(B급) 전자파적합기기로서 주
로 가정에서 사용하는 것을 목적으로 하며, 모
든 지역에서 사용할 수 있습니다.

Korea RoHS Material Content Declaration

확인 및 평가 양식은 제품에 포함 된 유해 물질의 허용 기준의 준수에 관한

110181-
상호 : 앤비디아홍콩홀딩즈리미티드( 영업소) 법인등록번호
0036373

준비 120-84-
대표자성명 카렌테레사번즈 사업자등록번호:
06711
주소 서울특별시 강남구 영동대로 511, 2101호 ( 삼성동, 코엑스무역타워)
제품 내용
제품의 종류 해당없음 제품명(규격) 해당없음

세부모델명(번호): 해당없음 제품출시일 해당없음

제품의 중량 해당없음 제조, 수입업자 앤비디아

Jetson Nano Developer Kit DA_09402_004 | 17


Chapter 21. 279

엔비디아의 그래픽 카드제품은 전기 전자제품 및 자동차의 자원순환에 관한 법률 시행령 제


11조 제 1항에 의거한 법 시행행규칙 제 3조에에따른 유해물질함유 기준을 확인 및 평가한
결과, 이를 준수하였음을 공표합니다.

구비서류 : 없음
작성방법

① 제품의 종류는 "전기.전자제품 및 자동차의 자원순환에관한 법률 시행령" 제 8조 제 1항


및 제 2항에 따른 품목별로 구분하여 기재합니다.

② 전기 전자 제품의 경우 모델명 (번호), 자동차의 경우, 제원관리번호를 기재합니다.

③ 해당제품의 제조업자 또는 수입업자를 기재합니다.

Confirmation and Evaluation Form Concerning the Adherence to Acceptable


Standards of Hazardous Materials Contained in Products

Nvidia HongKong Corporate


110181-
Company Name: Holding Ltd.Korea Identification
Statement 0036373
branch Number:
Prepared
Name of Company Business Registration 120-84-
by Karen Theresa Burns
Representative: Number: 06711
Address 2788 San Tomas Expressway, Santa Clara, CA 95051
Product Information
Product Category: N/A Name of Product: N/A
Detailed Product
Date of first market
Model Name N/A N/A
release:
(Number):
Manufacturer and/or
Weight of Product: N/A NVIDIA Corporation
Importer:

This for is publicly certify That NVIDIA Company has undergone the confirmation and
evaluation procedures for the acceptable amounts of hazardous materials contained in
graphic card according to the regulations stipulated in Article 3 of the ‘Status on the
Recycling of Electrical and Electronic Products, and Automobiles’ and that company has
graphic card adhered to the Enforcement Regulations of Article 11, Item 1 of the statute.

Attachment: None
*Preparing the Form

Jetson Nano Developer Kit DA_09402_004 | 18


280

① Please indicate the product category according to the categories listed in Article 8, Items
1and 2 of the ‘ Enforcement Ordinance of the Statute on the Recycling of Electrical, Electronic
and Automobile Materials’
② For electrical and electronic products, please indicate the Model Name (and number). For
automobiles, please indicate the Vehicle Identification Number.
③ Please indicate the name of manufacturer and/or importer of the product.

RUSSIA/KAZAKHSTAN/BELARUS
Customs Union Technical Regulations (CU TR)

This device complies with the technical regulations of the Customs Union (CU TR)
This device complies with the rules set forth by Federal Agency of Communications and the
Ministry of Communications and Mass Media
Federal Security Service notification has been filed.

TAIWAN
Bureau of Standards, Metrology & Inspection (BSMI)

This device complies with CNS 13438 (2006) Class B.

Product Name: Jetson Nano Developer Kit開發者組件

Taiwan RoHS Material Content Declaration

限用物質含有情况標示聲明書
Declaration of the presence condition of the Restricted Substances Marking
設備名稱:Jetson Nano Developer Kit
Equipment Name: Jetson Nano Developer Kit
單元 限用物質及其化學符號
Parts Restricted substances and its chemical symbols

Jetson Nano Developer Kit DA_09402_004 | 19


Chapter 21. 281

多溴聯 多溴二苯
铅 汞 镉 六價铬
苯 醚
(Pb ) (Hg) (Cd) (Cr(VI))
(PBB) (PBDE)
印刷電路板
O O O O O O
PCB
處理器
O O O O O O
Processor
主動電子零件 - O O O O O
Active components
被動電子零件
- O O O O O
Passive components
存儲設備
O O O O O O
Memory
機械部件
- O O O O O
Mechanicals
連接器/線材
- O O O O O
Connectors/Cable
焊接金屬
O O O O O O
Soldering material
助焊劑,錫膏,標籤及耗材
Flux, Solder Paste, label and O O O O O O
other consumable materials

備考1:O:系指該限用物質未超出百分比含量基準值
Note 1: O:indicates that the percentage content of the restricted substance does not exceed the
percentage of reference value of presence.
備考2: -:系指該项限用物質为排外项目。
Note 2:-:indicates that the restricted substance corresponds to the exemption.

此表中所有名稱中含 “-” 的部件均符合歐盟 RoHS 立法。


All parts named in this table with an “-” are in compliance with the European Union’s RoHS Legislation.

注:環保使用期限的參考標識取决與產品正常工作的温度和濕度等條件
Note: The referenced Environmental Protection Use Period Marking was determined according to normal
operating use conditions of the product such as temperature and humidity.

CHINA
China RoHS Material Content Declaration

产品中有害物质的名称及含量
The Table of Hazardous Substances and their Content
根据中国《电器电子产品有害物质限制使用管理办法》
as required by Management Methods for Restricted Use of Hazardous Substances in Electrical
and Electronic Products

Jetson Nano Developer Kit DA_09402_004 | 20


282

有害物质
部件名称 Hazardous Substances
Parts 多溴二苯
铅 汞 镉 六价铬 多溴联苯

(Pb) (Hg) (Cd) (Cr(VI)) (PBB)
(PBDE)
印刷电路板
O O O O O O
PCB
处理器
O O O O O O
Processor
主动电子零件
X O O O O O
Active components
被动电子零件
X O O O O O
Passive components
存储设备
O O O O O O
Memory
机械部件
X O O O O O
Mechanicals
连接器/线材
X O O O O O
Connectors / Cable
焊接金属
O O O O O O
Soldering material
助焊剂,锡膏,标签及耗

Flux, Solder Paste, label O O O O O O
and other consumable
materials
本表格依据SJ/T 11364-2014 的规定编制
The table according to SJ/T 11364-2014

O:表示该有害物质在该部件所有均质材料中的含量均在GB/T 26572-2011 标准规定的限量要求


以下。
O: Indicates that this hazardous substance contained in all of the homogeneous materials for
this
part is below the limit requirement in GB/T 26572-2011.
X:表示该有害物质至少在该部件的某一均质材料中的含量超出GB/T 26572-2011 标准规定的限
量要求。
X: Indicates that this hazardous substance contained in at least one of the homogeneous
materials used for this part is above the limit requirement in GB/T 26572-2011.

此表中所有名称中含 “X” 的部件均符合欧盟 RoHS 立法。


All parts named in this table with an “X” are in compliance with the European Union’s
RoHS Legislation.

注:环保使用期限的参考标识取决于产品正常工作的温度和湿度等条件
Note: The referenced Environmental Protection Use Period Marking was determined according
to normal operating use conditions of the product such as temperature and humidity.

Jetson Nano Developer Kit DA_09402_004 | 21


Chapter 21. 283

INDIA
India RoHS Compliance Statement
This product, as well as its related consumables and spares, complies with the reduction
in hazardous substances provisions of the “India E-waste (Management and Handling)
Rule 2016”. It does not contain lead, mercury, hexavalent chromium, polybrominated
biphenyls or polybrominated diphenyl ethers in concentrations exceeding 0.1 weight %
and 0.01 weight % for cadmium, except for where allowed pursuant to the exemptions
set in Schedule 2 of the Rule.

India RoHS Self-Declaration Form


(as per E-Waste (Management) Rules, 2016)
Product In case
Category RoHS Product is
& Code Date Informatio imported
(as Produc of Complianc n from
Sr.
Per Product Mode t placin e with provided other
No
Schedule I name l No. Weight g on RoHS on product country,
.
of E- (g) marke Yes/No info name of the
Waste t booklet country
(M) Rules, Yes/No manufacture
2016 d
Jetson
Nano
i. ITEW2 P3450 N/A N/A Yes Yes China
Develope
r Kit

Jetson Nano Developer Kit DA_09402_004 | 22


284

Notice
© 2017-2020 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Jetson, Jetson Nano, and JetPack
are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other
company and product names may be trademarks of the respective companies with which they are associated.
NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO
WARRANTIES, EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND ALL
EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY
OR CONDITION OF TITLE, MERCHANTABILITY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE
AND NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE MAXIMUM EXTENT PERMITTED BY LAW.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no
responsibility for the consequences of use of such information or for any infringement of patents or other rights
of third parties that may result from its use. No license is granted by implication or otherwise under any patent
or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change
without notice. This publication supersedes and replaces all information previously supplied. NVIDIA
Corporation products are not authorized for use as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.

www.nvidia.com
Chapter 21. 285

21.4. Jetson Nano Developer Kit 40-Pin Expansion


Header Configuration - Application Note

Jetson Nano Developer Kit 40-Pin


Expansion Header Configuration
Application Note

DA_09565-001 | August 2019


286

Document History
Doc_Number

Version Date Authors Description of Change


1.0 July 9, 2019 jsachs, jonathanh, Initial release
twarren
1.1 July 24, 2019 jsachs, jonathanh, Minor corrections
twarren
1.2 August 7, 2019 jsachs, jonathanh, Minor corrections
twarren

Jetson Nano Developer Kit 40-Pin Expansion Header Configuration DA_09565-001 | ii


Chapter 21. 287

Table of Contents
Customizing the Jetson Nano 40-pin Expansion Header .........................................................1
Prerequisites .................................................................................................................... 1
Download and Customize the Pinmux Spreadsheet ...................................................... 1
Download the L4T Driver Package and Source Files ..................................................... 2
Update the U-Boot Pinmux .............................................................................................. 2
Update the CBoot Pinmux ................................................................................................ 4
Flash Jetson Nano............................................................................................................ 5

Jetson Nano Developer Kit 40-Pin Expansion Header Configuration DA_09565-001 | iii
288

Customizing the Jetson Nano 40-pin


Expansion Header

The Jetson Nano Developer Kit carrier board includes a 40-pin expansion header. By default,
all interface signal pins are configured as GPIO inputs, except pins 3 and 5 and pins 27 and 28,
which are I2C SDA and SCL, and pins 8 and 10, which are UART TX and RX.
This application note describes how to alter the function of pins on the 40-pin header by using
the Jetson Nano Developer Kit pinmux spreadsheet. Note that the pinmux actually configures
the SoC on the Jetson module, which ultimately routes the SoC signals to the carrier board’s
40-pin header.
If you want to configure other SoC pins, see the full Jetson Nano module pinmux spreadsheet
and the NVIDIA L4T Development Guide for complete documentation.

Prerequisites
• A Jetson Nano Developer Kit.
• A computer running Linux (a Linux host) which has the GCC toolchain installed that is
recommended for building L4T. For more details, see the topic “The L4T Toolchain”
section in the L4T Development Guide.
• A computer running Microsoft Windows (a Windows host) with Microsoft Excel installed.

Download and Customize the Pinmux Spreadsheet


1. On the Windows host, download the Jetson Nano Developer Kit Pinmux spreadsheet from
the Jetson Download Center.
2. Open the file in Microsoft Excel and ensure that:

• The spreadsheet file is writable.


• The Excel option “Enable Editing” or “Enable Content” is selected. You may need to set
one of these options if Excel displays warnings such as “PROTECTED VIEW” or
“SECURITY WARNING.”
• The spreadsheet macros are enabled. The spreadsheet needs them to generate device
tree source files.
3. Modify columns AR (“Customer Usage”) and AS (“Pin Direction”) to change the function of
individual pins on the developer kit’s 40-pin header.
4. Check the following columns through BA to see if any other values must be adjusted.

Jetson Nano Developer Kit 40-Pin Expansion Header Configuration Doc_Number | 1


Chapter 21. 297

List of Figures
Figure 1. TXB Devices Typical Usage .................................................................................2
Figure 2. TXB Device Usage on Developer Kit Carrier Board...............................................2
Figure 3. Simplified TXB0108 Architecture Diagram ...........................................................3
Figure 4. One TXB Connection from Jetson Nano to 40-Pin Header ....................................6
Figure 5. Audio Codec (12S and MCLK) Example Connections ............................................7
Figure 6. Button and LED Example Connections ................................................................7

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | iv
298

Introduction

This application note describes how to work with the signals on the 40-pin expansion header
on the NVIDIA® Jetson Nano™ Developer Kit carrier board. All signals routed to this connector
from the Jetson Nano module, except for the I2C interfaces, pass through TI TXB0108RGYR
level shifters. This is necessary to level shift the signals from 1.8V levels from the module to
3.3V levels at the expansion connector pins. This level shifter retains the ability to pass
bidirectional signals without the need for a direction pin. There are some considerations that
must be considered when using the signals that come from (or go to) these level shifters.
These considerations are described in this application note.

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 1
Chapter 21. 299

TXB0108 Level Shifter

Figure 1 shows the usage of the TXB devices and Figure 2 shows the TXB on the carrier board.

Figure 1. TXB Devices Typical Usage


1.8V

3.3V
VccA VccB
OE
TXB0108
A0 B0
A1 B1
Controller A2 B2
A3 B3
Peripheral
A4 B4
A5 B5
A6 B6
A7 B7

Figure 2. TXB Device Usage on Developer Kit Carrier Board


1.8V
3.3V
20kΩ

VccA VccB 3.3V 5.0V


OE
Jetson 1MΩ
TXB0108
1 2
SF IO/GPIO_1 Pin N1 A0 B0 I2C1_SDA 3 4
SF IO/GPIO_2 Pin N2 A1 B1 I2C1_SCL 5 6
SF IO/GPIO_3 Pin N3 A2 B2 GPIO09 7 8
UART1_TXD
SF IO/GPIO_4 Pin N4 A3 B3 9 10
UART1_RXD
SF IO/GPIO_5 Pin N5 A4 B4 UART1_RTS* 11 12
I2S0_SCLK
A5 B5 SPI1_SCK 13 14
A6 B6 To GPIO12 16
SPI1_CS1*
15
SF IO/GPIO_6 Pin N6 A7 B7 TXBx 17 18
SPI1_CS0*
SF IO/GPIO_7 Pin N7 SPI0_MOSI
20kΩ

19 20
SF IO/GPIO_8 Pin N8 VccA VccB To 40-Pin SPI0_MISO 21 22
SPI1_MISO
SF IO/GPIO_9 Pin N9
1MΩ OE Header SPI0_SCK 23 24
SPI0_CS0*
TXB0108 25 26
SPI0_CS1*
I2C0_SDA I2C0_SCL
A0 B0 GPIO01
27 28

SF IO/GPIO_10 Pin N10 A1 B1 GPIO11


29 30
GPIO07
SF IO/GPIO_11 Pin N11 A2 B2 To GPIO13
31 32

SF IO/GPIO_12 Pin N12 A3 B3 I2S0_FS


34
UART1_CTS*
33
TXBx
SF IO/GPIO_13 Pin N13 A4 B4 SPI1_MOSI
36
I2S0_DIN
35

. A5 B5 38
I2S0_DOUT
37

. A6 B6 39 40

. A7 B7

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 2
300

TXB0108 Level Shifter

TXB Voltage Translator Architecture


The TXB push-pull buffered architecture supports bidirectional signaling without the need for
a direction control signal to select whether a pin is an output or an input. This type of device is
intended to interface with push-pull CMOS output drivers or high impedance loads, such as
standard CMOS inputs. The TXB0108 devices are not intended for use in open-drain
applications or to drive low impedance (i.e. directly driving an LED) or high capacitive loads (i.e.
driving a long wire, or multiple loads). To support driving either of these load types, an
additional buffer/transistor/FET may be required.

Figure 3. Simplified TXB0108 Architecture Diagram


VccA
VccB

OE
One
Shot
T1

4kΩ

One
Shot
T2
A B
One
T3 Shot

4kΩ

One
T4 Shot

The TXB level shifters have output buffers with ~4kΩ resistors in series which make them very
weak. A one-shot (OS) circuit is included to help with signal rise/fall times. The OS circuitry is
enabled when a rising or falling edge is sensed at one of the inputs.
When an A-port pin is connected to a push-pull driver output or a strong pull-up/down resistor
(equivalent to a push-pull driver) and driven high, the weak buffer (w/series resistor) drives the
B-port high along with the OS circuit which is enabled when the rising edge is sensed. The B-
port is driven high by both the buffer and T1 until the OS circuit times out after which only the
weak buffer continues to drive the output.

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 3
Chapter 21. 301

TXB0108 Level Shifter

If, instead, the external push-pull driver drives the A-Port pin low, the B-port will be driven low
along with the OS circuit which is enabled when the falling edge is sensed. The B-port is driven
low by both the buffer and T2 until the OS circuit times out and only the weak buffer continues
to drive the output. (See Figure 3).
The TXB-type level shifter is called "weak-buffered,” because it is strong enough to hold the
output port high or low during a DC state with a high impedance load, but weak enough that it
can easily be over driven by a push/pull driver (or strong pull-up/down resistor). This allows
the device to support inputs or outputs on both the A- and B-port sides.
An external push-pull driver or strong pull-up/down resistor must be able to supply more than
±2 mA of current to reliably overdrive the weak output buffer which is always active as long as
the OE pin is active (high). If a pull-up/down resistor is used to force a TXB pin to a high/low
state, similar to a push-pull driver, the resistor should be ~VccX/2 (~1.65kΩ or stonger if VccX
is 3.3V, or ~0.9kΩ or stronger if VccX is 1.8V.

“Keeper” Pull-up and Pull-down Resistors


If the design requires weak “keeper” pull-up and down resistors on any of the lines connected
to the TXB devices, the resistors must be weak. When the TXB outputs are in a steady high and
low state, the weak buffer with the 4kΩ series resistor is driving the line. If an external
“keeper” pull-up and down resistor is added, a resistor divider network is formed with the 4kΩ
series resistor. If the “keeper” resistor value is too small, the level driven by the TXB buffer
may not meet a valid VOH (high) or VOL (low) level. This could lead to unreliable operation. The
value of the external “keeper” resistor should be > 50kΩ. It is better not to have these “keeper”
resistors whenever possible as they will have little effect when the TXB devices are powered.
In the case of the Jetson Nano Developer Kit carrier board design, the TXB level shifters OE is
active whenever the system is powered on. There are very weak (1mΩ) pull-downs on the OE
pins to keep them disabled as the power comes on, and stronger (20kΩ) pull-ups to 1.8V to
enable the TXB devices once power is enabled.

Driving Capacitive Loads


The TXB level shifters can drive up to about 70pF. If the load is much larger, the OS circuitry,
which is enabled when a rising and falling edge is detected, is enabled for ~5ns. If the
capacitive load is too large, the OS will timeout before the signal reaches a fully high or low
level. After that, the signal will continue to rise and fall, but only driven by the weak output
buffer which can lead to slower than desired rise and fall times.

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 4
302

TXB0108 Level Shifter

Output Enable
The TXB level shifters have an output enable (OE) pin. When low, the outputs (weak buffers and
OS circuits) are disabled. When high, the weak buffers are enabled always, and the OS circuits
are enabled when a rising and falling edge is detected. The Jetson Nano Developer Kit carrier
board pulls the OE pins to 1.8V, so they are always enabled when the system is powered on.

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 5
Chapter 21. 303

Jetson Nano Developer Kit Examples

Following are several examples where pins from the 40-pin expansion header are connected
to some device or circuit and some things to consider. The following figure shows eight signals
from Jetson Nano that are routed to one of the TXB level shifters and then to the 40-pin
expansion header. These signals support the following options on Jetson Nano:
 I2S0_SCLK: Audio I2S interface shift clock or GPIO
 I2S0_DOUT: Audio I2S interface shift clock or GPIO
 I2S0_DIN: Audio I2S interface shift clock or GPIO
 I2S0_FS: Audio I2S interface shift clock or GPIO
 GPIO09: GPIO or Audio Master Clock
 GPIO13: GPIO or PWM
 GPIO11: GPIO or General Purpose Clock
 GPIO01: GPIO or General Purpose Clock

Figure 4. One TXB Connection from Jetson Nano to 40-Pin Header

3.3V 5.0V
1.8V
1 2

3 4
20kΩ

3.3V 5 6
GPIO09 7 8
1MΩ VccA VccB 10
9
OE 11 12
I2S0_SCLK
TXB0108
Jetson 13

15
14

16

I2S0_SCLK 199 A0 B0 17 18

I2S0_DOUT 220 A1 B1 19 20

I2S0_DIN 195 A2 B2 21 22

I2S0_FS 197 A3 B3 23 24

A4 B4 25 26

GPIO09 211 A5 B5 27 28
GPIO01
GPIO13 228 A6 B6 30
GPIO11
29

GPIO11 216 A7 B7 32
GPIO13
31

GPIO01 118 34
I2S0_FS
33

36
I2S0_DIN
35

38
I2S0_DOUT
37

39 40

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 6
304

Jetson Nano Developer Kit Examples

The following example shows possible connection of the Jetson Nano I2S0 interface and
GPIO09 (Audio MCLK) Audio Codec.

Figure 5. Audio Codec (12S and MCLK) Example Connections

3.3V 5.0V

1 2

3 4

5 6
GPIO09 7 8 Audio Codec
9 10
I2S0_SCLK
11 12 SCLK
13 14 FS
15 16 DIN
17 18 DOUT
19 20

21 22 MCLK
23 24

25 26

27 28
GPIO01 30
29
GPIO11 32
GPIO13
31

34
I2S0_FS
33

36
I2S0_DIN
35

38
I2S0_DOUT
37

39 40

The following example shows possible connections to a button and an LED.

Figure 6. Button and LED Example Connections

3.3V 3.3V 5.0V


1.5kΩ

1 2

3 4

Button 5 6
GPIO09 7 8

9 10

11 12
I2S0_SCLK
13 14

15 16
5.0V 17 18

19 20
100Ω

21 22

23 24

25 26

27 28
GPIO01 30
29
GPIO11 32
6.8kΩ GPIO13
31

2N3904 34
I2S0_FS
33

36
I2S0_DIN
35

38
I2S0_DOUT
37

39 40

Jetson Nano Developer Kit 40-Pin Expansion Header GPIO Usage Co nsiderations DA-09753-001_v1.0 | 7
Chapter 21. 305

Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or qualit y
of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness
of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for
the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This
document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any
time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless
otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA
hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced
in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment ,
nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property
or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and
therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all
parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the
applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and
perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product
designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirement s
beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on
or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under
this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use
such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the
patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights
of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and
in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED,
IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO
EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE
OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that
customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described
herein shall be limited in accordance with the Terms of Sale for the product.

ARM
ARM, AMBA and ARM Powered are registered trademarks of ARM Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands
or product names are the property of their respective holders. ʺARMʺ is used to represent ARM Holdings plc; its operating company ARM Limited;
and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.;
ARM Germany GmbH; ARM Embedded Technologies Pvt. Ltd.; ARM Norway, AS and ARM Sweden AB.

Trademarks
NVIDIA, the NVIDIA logo, and Jetson Nano are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other
company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2020 NVIDIA Corporation. All rights reserved.

NVIDIA Corporation | 2788 San Tomas Expressway, Santa Clara, CA 95051


http://www.nvidia.com
22. RASP CAM HQ Raspberry Pi -
Kamera, 12MP, Sony IMX477R
• https://www.arducam.com/raspberry-pi-high-quality-camera-lens/

• https://www.raspberrypi.org/products/raspberry-pi-high-quality-camera/
0

• https://www.reichelt.de/de/de/raspberry-pi-kamera-12mp-sony-imx477r-rasp-cam-hq-p276
html?PR

RASP CAM HQ
Die Raspberry Pi High Quality Kamera ist das neueste Kamerazubehör von Raspberry
Pi (ohne Objektiv, siehe Zubehör).
Sie bietet eine höhere Auflösung zum Vorgängermodell v2 (12 Megapixel, im Vergleich
zu 8 Megapixeln), eine höhere Empfindlichkeit (ca. 50 % mehr Fläche pro Pixel für
eine verbesserte Leistung bei schlechten Lichtverhältnissen) und ist für den Einsatz
mit Wechselobjektiven in C- und CS-Fassung ausgelegt. Andere Objektivformfaktoren
können jedoch mit Objektivadaptern von Drittanbietern angepasst werden.
Sie ist unter anderem für Industrie- und Verbraucheranwendungen, einschließlich
Sicherheitskameras konzipiert worden, die ein Höchstmaß an visueller Wiedergabetreue
und/oder die Integration mit Spezialoptiken erfordern.

Technische Daten

• kompatibel mit allen Raspberry Pi Modellen

• Sony IMX477R Sensor

• 12,3 Megapixel Auflösung

• 7,9 mm Sensor-Diagonale

• 1, 55µm × 1, 55µm Pixelgröße

• Ausgabe: ROH12/10/8, COMP8

• Fokus zurück: Einstellbar (12,5 mm - 22,4 mm)

• Objektiv-Standards: C-Mount

• integrierter IR-Sperrfilter

• Stativ-Montage: 1/4“-20

Lieferumfang

• Leiterplatte mit einem Sony IMX477-Sensor (ohne Objektiv, siehe Zubehör)

• ein 200 mm FPC-Kabel für den Anschluss an einem Raspberry PI

• eine gefräste Aluminium-Objektivfassung mit integriertem Stativanschluss und


Fokuseinstellring

307
308

• ein C- auf CS-Mount-Adapter

Optional erhältlich (siehe Zubehör)

• 6 mm Weitwinkel-Objektiv mit der Artikelnr. RPIZ CAM 6MM WW


• 16 mm Teleobjektiv mit der Artikelnr. RPIZ CAM 16MM TO

Hinweis

• zum Betreiben der Kamera wird die neuste Raspberry Pi Software benötigt
• Produktion bis mindestens Januar 2026
Ausführung
Modell Raspberry Pi
Video 30 fps 1920 × 1080 Pixel
Foto 4056 × 3040 Pixel
Allgemeines
Ausführung Standard
Auflösung 12 MP
Video 60 fps 1280 x 720 Pixel
Betrachtungswinkel 75◦
Bildsensor 1/4“
Anschlüsse / Schnittstellen
Anschluss CSI
Sonstiges
Spezifikation IMX477R
Lampeneigenschaften
Länge 38 mm
Maße
Breite 38 mm
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0261
Verpackungsgewicht 0.046 kg
RoHS konform
EAN / GTIN 0633696492738

Produktinformationen „Raspberry Pi High Quality Kamera“


Die Raspberry Pi High Quality Camera ist das neueste Kameramodul der Raspberry
Pi Foundation. Sie bietet eine höhere Auflösung (12 Megapixel, im Vergleich zu 8
Megapixeln) und Empfindlichkeit (ca. 50 % mehr Fläche pro Pixel für eine verbesserte
Leistung bei schlechten Lichtverhältnissen) als das bestehende Kameramodul v2 und
ist für den Einsatz mit Wechselobjektiven in C- und CS-Mount-Formaten ausgelegt.
Andere Objektiv-Formfaktoren können mit Objektivadaptern von Drittanbietern ver-
wendet werden.
Die High Quality-Kamera bietet eine Alternative zum Kameramodul v2 für Industrie-
und Verbraucheranwendungen, einschließlich Sicherheitskameras, die ein Höchstmaß an
visueller Wiedergabetreue und/oder Integration mit Spezialoptiken erfordern. Sie ist
mit allen Modellen von Raspberry Pi Computer ab Raspberry Pi 1 Modell B kompatibel,
wobei die neueste Software-Version von www.raspberrypi.org. verwendet wird.
Das Paket umfasst eine Platine mit einem Sony IMX477-Sensor, ein FPC-Kabel für
den Anschluss an einen Raspberry Pi-Computer, eine gefräste Aluminium-Linsen-
fassung mit integrierter Stativhalterung und Fokuseinstellring sowie einen C- auf
Chapter 22. RASP CAM HQ Raspberry Pi - Kamera, 12MP, Sony IMX477R 309

CS-Mount-Adapter. Mittels Adapterkabel kann die Kamera außerdem auch mit dem
Raspberry Pi Zero verwendet werden.

Technische Daten
Sensor:
Sony IMX477R Gestapelter, hintergrundbeleuchteter Sensor
7,9 mm Sensor-Diagonale
1, 55µm × 1, 55µm Pixelgröße
Ausgang: RAW12/10/8, COMP8
Auflagemaß: Einstellbar (12,5 mm-22,4 mm)

Objektivtyp:
C-Mount
CS-Mount (C-CS Adapter inklusive)
IR-Sperrfilter: Integriert (Der Sperrfilter kann entfernt werden, dies ist jedoch irre-
versibel)
Länge des Flachbandkabels: 200 mm
Stativanschluss: 1/4“-20
Die Raspberry Pi High Quality Camera wird mindestens bis Januar 2026 in Produktion
bleiben
310

MAXIMALE AUFLÖSUNG DER KAMERA Technisch wäre der CCD Sensor der
Raspberry Pi HQ Kamera, gemäß Datenblatt, in der Lage, Fotos und Videos mit einer
Auflösung von 4K / 60Hz aufzunehmen.
Dies wäre aber nur mit einem 4-Lane CSI-2 Interface möglich. Der Raspberry Pi nutzt
jedoch ein 2-Lane CSI-2 Interface und hat darüber hinaus auch eine Limitierung durch
den H.264 Hardware Encoder.
Somit ist die Auflösung der HQ Kamera in Verbindung mit dem Raspberry Pi auf
1920×1080 für Videoaufnahmen und 2592×1944 für Fotos begrenzt.
BRENNWEITE, WEITWINKEL, TELE, ZOOM – KLEINER OBJEKTIV-GUIDE
Bei der Auswahl eines passenden Objektivs für die HQ Kamera, kommt man unweiger-
lich mit Fachbegriffen wie Brennweite, Tele- oder Weitwinkelobjektiv in Berührung.
Hier eine kurze Erklärung:
Grundsätzlich gilt: Die Brennweite eines Objektivs bestimmt das Sichtfeld deiner
Kamera und wird standardmäßig in Millimetern (mm) gemessen. Als Faustregel kann
man folgendes sagen: Je höher die Nummer / Zahl der Brennweite, desto enger das
Sichtfeld und desto dichter / größer erscheint das zu fotografierende Objekt.
Das bedeutet bei einem Objektiv mit 6 mm Brennweite, dass du einen großen Bereich
mit der Kamera “sehen” kannst, das Objekt was du fotografieren möchtest jedoch
relativ klein erscheint. Weitwinkelobjektive eignen sich daher ideal für Landschaftsauf-
nahmen oder Aufnahmen einer Gruppe von Personen.
Wenn du nun den Kirchturm in der Landschaft oder die Schwiegermutter in der
Hochzeitsgesellschaft fotografieren möchtest, eignet sich ein Teleobjektiv mit einer
Brennweite ab 16 mm.
Die ideale Symbiose aus Weitwinkel und Tele ist ein Zoomobjektiv. Hier lässt sich die
Brennweite über einen Zoomring flexibel ändern und das zu fotografierende Objekt
erscheint somit größer oder kleiner.
Die Firma Nikon hat auf dieser Seite einen super Objektivsimulatior, wo das Thema
Brennweite nochmal in Verwendung mit einer Nikon Kamera verständlich gezeigt wird.
VERBINDUNG / ANSCHLUSS DER KOMPONENTEN Da der Raspberry Pi Zero
einen kleineren Kameraanschluss hat, musst du zunächst das bei der HQ-Kamera
mitgelieferte Kabel austauschen. Dazu musst du vorsichtig den schwarzen Bügel des
ZIF Connectors nach hinten ziehen. Achtung: Wende hier nicht zu viel Kraft an, da
der Anschluss sehr empfindlich ist!
Ziehe nun das weiße Kabel aus dem Anschluss und verbinde das Raspberry Pi Zero
Adapterkabel, in dem du das Kabel mit den goldenen Kontaktflächen der breiten Seite
nach unten in den Connector schiebst. Anschließend schiebst du den schwarzen Bügel
wieder vorsichtig nach vorne.
Die schmale Seite des Adapterkabels kommt anschließend ebenfalls mit den goldenen
Kontakten nach unten zeigend in den Connector des Raspberry Pi Zero.
Schiebe nun noch die bespielte microSD Karte in den Pi Zero und verbinde das Micro
USB Kabel mit der Buchse USB / J10.
Deine Raspberry Pi Webcam ist nun einsatzbereit!
VERBINDUNG MIT DEM COMPUTER UND ERSTER TEST Du kannst nun den
USB A Stecker des micro USB Kabels mit deinem Computer verbinden. Die Webcam
wird nach ca. 10 Sekunden ohne Treiberinstallation als “Piwebcam” erkannt und steht
dir in allen Programmen mit Webcam-Unterstützung als Kamera zur Auswahl.
23. RPIZ CAM 16mm
https://www.reichelt.de/raspberry-pi-16mm-kameralinse-teleobjektiv-rpiz-cam-16mm-to-p2769
html?&nbc=1&trstct=lsbght_sldr::276919
Dieses 16mm Teleobjektiv ist für die originale Raspberry Pi Kamera konzipiert worden
Lieferumfang
16 mm Teleobjektiv (ohne Bildsensor-Platine, siehe Zubehör)
Ausführung
Modell Raspberry Pi
Allgemeines
Ausführung Zubehör
Sonstiges
Spezifikation Tele
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0123
Verpackungsgewicht 0.214 kg
RoHS konform
EAN / GTIN 9900002769213

311
24. RPIZ CAM 6mm WW
https://www.reichelt.de/raspberry-pi-6mm-kameralinse-weitwinkel-rpiz-cam-6mm-ww-p276922.
html?&nbc=1&tr
Dieses Weitwinkel-Kameraobjektiv ist für die originale Raspberry Pi Kamera konzipiert
worden.
Lieferumfang
6 mm Weitwinkelobjektiv (ohne Bildsensor-Platine, siehe Zubehör)
Blende: F1.2
Gewinde: CS
minimaler Objektabstand: 20 cm
Ausführung
Modell Raspberry Pi
Allgemeines
Ausführung Zubehör
Sonstiges
Spezifikation Weitwinkel
Herstellerangaben
Hersteller RASPBERRY PI
Artikelnummer des Herstellers SC0124
Verpackungsgewicht 0.107 kg
RoHS konform
EAN / GTIN 9900002769220

313
314
25. Data Sheets RaspCAM

25.1. Data Overview

Raspberry Pi
High Quality Camera

Published in April 2020


by Raspberry Pi Trading Ltd.

www.raspberrypi.org

315
316

NVIDIA JETSON NANO MODULE


TECHNICAL SPECIFICATIONS

GPU 128-core Maxwell


CPU Quad-core ARM A57 @ 1.43 GHz
Memory 4 GB 64-bit LPDDR4 25.6 GB/s
Storage 16 GB eMMC 5.1
Video Encode 4K @ 30 | 4x 1080p @ 30 | 9x 720p @ 30 (H.264/H.265)
Video Decode 4K @ 60 | 2x 4K @ 30 | 8x 1080p @ 30 | 18x 720p @ 30
(H.264/H.265)
CSI 12 (3x4 or 4x2) lanes MIPI CSI-2 D-PHY 1.1
Connectivity Gigabit Ethernet
Display HDMI 2.0, eDP 1.4, DP 1.2 (two simultaneously)
PCIe 1x1/2/4 PCIe Gen2
USB 1x USB 3.0, 3x USB 2.0
Others I2C, I2S, SPI, UART, SD/SDIO, GPIO
Mechanical 69.6 mm x 45 mm 260-pin SODIMM Connector

Learn more at https://developer.nvidia.com/jetson

© 2019 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, CUDA, Jetson, Jetson Nano, NVIDIA Maxwell, and TensorRT are
trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be
trademarks of the respective companies with which they are associated. ARM, AMBA and ARM Powered are registered trademarks of ARM
Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. All other brands or product names are the property of their respective holders.
“ARM” is used to represent ARM Holdings plc; its operating company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea
Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt.
Ltd.; ARM Norway, AS and ARM Sweden AB. Jul19

Downloaded from Arrow.com.


Chapter 25. Data Sheets RaspCAM 317

25.2. Data Sheet Raspberry Pi High Quality Camera

Raspberry Pi
High Quality Camera

Published in April 2020


by Raspberry Pi Trading Ltd.

www.raspberrypi.org
318

Overview

The Raspberry Pi High Quality Camera is the latest camera accessory from
Raspberry Pi. It offers higher resolution (12 megapixels, compared to 8
megapixels), and sensitivity (approximately 50% greater area per pixel for
improved low-light performance) than the existing Camera Module v2, and
is designed to work with interchangeable lenses in both C- and CS-mount
form factors. Other lens form factors can be accommodated using third-party
lens adapters.
The High Quality Camera provides an alternative to the Camera Module v2
for industrial and consumer applications, including security cameras,
which require the highest levels of visual fidelity and/or integration with
specialist optics. It is compatible with all models of Raspberry Pi computer
from Raspberry Pi 1 Model B onwards, using the latest software release from
www.raspberrypi.org.1
The package comprises a circuit board carrying a Sony IMX477 sensor,
an FPC cable for connection to a Raspberry Pi computer, a milled aluminium
lens mount with integrated tripod mount and focus adjustment ring, and a C- to
CS-mount adapter.

1
Excluding early Raspberry Pi Zero models, which lack the necessary FPC connector. Later Raspberry Pi Zero
models require an adapter FPC, sold separately.

2 Raspberry Pi High Quality Camera


Chapter 25. Data Sheets RaspCAM 319

Specification

Sensor: Sony IMX477R stacked, back-illuminated sensor


12.3 megapixels
7.9 mm sensor diagonal
1.55 μm × 1.55 μm pixel size

Output: RAW12/10/8, COMP8

Back focus: Adjustable (12.5 mm–22.4 mm)

Lens standards: CS-mount


C-mount (C-CS adapter included)

IR cut filter: Integrated2

Ribbon cable length: 200 mm

Tripod mount:  1/4”-20

Compliance: FCC 47 CFR Part 15, Subpart B, Class B Digital Device

Electromagnetic Compatibility Directive (EMC)


2014/30/EU

Restriction of Hazardous Substances (RoHS)


Directive 2011/65/EU

Production lifetime: The Raspberry Pi High Quality Camera will remain


in production until at least January 2026

2
Can be removed to enable IR sensitivity. Modification is irreversible.

3 Raspberry Pi High Quality Camera


320

Physical specifications

10.2

5.02 10.16
1.2 4
0.8
5.8 1.4

2.5

22.4

38

30.75 8.5

36

7.62 13.97

11.43 38

Note: all dimensions in mm

SAFETY INSTRUCTIONS
To avoid malfunction or damage to this product, please observe the following:
•  Before connecting the device, shut down your Raspberry Pi computer and disconnect it from
external power.
• If the cable becomes detached, pull the locking mechanism forward on the connector, insert the ribbon
cable ensuring the metal contacts face towards the circuit board, then push the locking mechanism back
into place.
• This device should be operated in a dry environment at 0–50°C.
• Do not expose it to water or moisture, or place on a conductive surface whilst in operation.
• Do not expose it to excessive heat from any source.
• Care should be taken not to fold or strain the ribbon cable.
• Care should be taken when screwing in parts or fitting a tripod. A cross-thread can cause irreparable
damage and void the warranty.
• Take care whilst handling to avoid mechanical or electrical damage to the printed circuit board
and connectors.
• Avoid handling the printed circuit board whilst it is powered and only handle by the edges to minimise the
risk of electrostatic discharge damage.
• Store in a cool, dry location.
• Avoid rapid changes of temperature, which can cause moisture build up in the device, affecting
image quality.

4 Raspberry Pi High Quality Camera


Chapter 25. Data Sheets RaspCAM 321

5 Raspberry Pi High Quality Camera


322

MIPI DSI and MIPI CSI are service marks of MIPI Alliance, Inc
Raspberry Pi and the Raspberry Pi logo are trademarks of the Raspberry Pi Foundation

www.raspberrypi.org
Chapter 25. 323

25.3. Data sheet Raspberry Pi High Quality Camera -


Getting started

Raspberry Pi
High Quality Camera
Getting started
Operating instructions, regulatory compliance
information, and safety information

Published in April 2020


by Raspberry Pi Trading Ltd.

www.raspberrypi.org
324

Operating instructions

Dust cap

Optional C-CS adapter

Back focus adjustment ring

Back focus lock screw


Optional tripod mount
Main housing and sensor

Ribbon cable to Raspberry Pi


Main circuit board

Mounting holes

Fitting lenses
The High Quality Camera is designed to accept CS-mount lenses. An optional
adapter is supplied to extend the back focus by 5 mm , such that the camera is
also compatible with C-mount lenses.
Please ensure that the dust cap is fitted when there is no lens fitted, because
the sensor is sensitive to dust. To fit a lens, unscrew the dust cap and screw the
lens into the threads. Remove the C-CS adapter when a CS-mount lens is to be
fitted; it is only required when a C-mount lens is fitted.
The CGL 6 mm CS-mount and 16 mm C-mount lenses are examples of third-
party products that are compatible with the High Quality Camera. See step-by-
step instructions for fitting the CS-mount and C-mount lenses.

Back focus adjustment


The back focus adjustment mechanism has two purposes:
1. When using a small, low-cost, fixed-focus lens, it allows adjustment of
the focus.
2. 
When using an adjustable-focus lens, it allows adjustment of the
focal range.
To adjust the back focus:
1. Ensure the lens is screwed all the way into the back focus adjustment ring.

2 Raspberry Pi High Quality Camera Getting Started


Chapter 25. 325

2. L
 oosen the back focus lock screw with a small flat screwdriver.
3. A
 djust the back focus height by turning the back focus adjustment ring
clockwise or anti-clockwise relative to the main housing until the camera
is in focus.
4. Tighten the back focus lock screw.

Tripod mount
The tripod mount is an optional component, and it can be unscrewed when it is
not needed. If it is needed, take care not to damage the ribbon when screwing
the tripod into the camera.

Connecting the High Quality Camera to a Raspberry Pi computer


Ensure your Raspberry Pi is switched off, then carefully release the plastic
catch on the Raspberry Pi’s camera connector. Insert the camera ribbon with
the contacts facing away from the catch. Once you have pushed the ribbon
in as far as it will go, push the catch back in. It is now safe to power up your
Raspberry Pi.

Operating the camera


First, enable the camera in Raspbian: in the Raspbian menu, select Preferences,
then Raspberry Pi Configuration. Click the Interfaces tab, find the Camera
entry in the list, and select Enabled. Click OK, and reboot your Raspberry Pi
when prompted.
raspistill is a command line tool for capturing camera images. To check
your camera is correctly installed, open a terminal window (Raspbian menu >
Accessories > Terminal) and take a test photograph by entering the command:
raspistill -o test.jpg
When you hit ENTER a live preview image will appear, and after a default period
of five seconds, the camera will capture a single still image. This will be saved
in your home folder and named test.jpg.
To use the camera just as a viewfinder, without saving a photo, use this
command:
raspistill -t 0
For more detailed information about installing and operating camera software,
refer to https://www.raspberrypi.org/documentation/configuration/camera.md.
For more advanced information about controlling the camera, see
https://www.raspberrypi.org/documentation/raspbian/applications/camera.md.
Alternatively, refer to The Official Raspberry Pi Camera Guide, published by
Raspberry Pi Press.

3 Raspberry Pi High Quality Camera Getting Started


326

Other features of the camera

Rotating the camera


It is possible to position the camera circuit board upside down with respect
to the tripod mount. In this situation the ribbon will extend out from the top of
the unit, rather than from the bottom.
To rotate the camera:
1. W
 ork in a clean and dust-free environment, as the sensor will be exposed
to the air. Work with the sensor facing downwards throughout the process.
2. U
 nscrew the two 1.5 mm hex lock keys on the underside of the main
circuit board. Be careful not to let the washers roll away.
3. L
 ift the main housing and rotate it 180 degrees. The gasket will remain on
the circuit board, so realign the housing with the gasket.
4. The nylon washer prevents damage to the circuit board; apply this
washer first. Next, fit the steel washer, which prevents damage to the
nylon washer.
5. S
 crew down the two hex lock keys. As long as the washers have been
fitted in the correct order, they do not need to be screwed very tightly.

Infrared (IR) filter


The High Quality Camera contains an IR filter, which is used to reduce the
camera’s sensitivity to infrared light. It is possible to remove this filter, but doing
so will void the warranty on the product, and is likely to prove irreversible. See
further information on removing the IR filter.

4 Raspberry Pi High Quality Camera Getting Started


Chapter 25. 327

Regulatory compliance

EU
The Raspberry Pi High Quality Camera is in conformity with the following
applicable community harmonised legislation:
Electromagnetic Compatibility Directive (EMC) 2014/30/EU,
Restriction of Hazardous Substances (RoHS) Directive 2011/65/EU
The following harmonised standards have been used to demonstrate conformity
to these standards:
EN 55032:2015
EN 55024:2010
IEC60695-2-11
EN60950-1

WEEE Directive Statement for the European Union


This marking indicates that this product should not be disposed with other household wastes throughout the
EU. To prevent possible harm to the environment or human health from uncontrolled waste disposal, recycle
it responsibly to promote the sustainable reuse of material resources. To return your used device, please use
the return and collection systems or contact the retailer where the product was purchased. They can take
this product for environmental safe recycling.

Note: See a full online copy of this Declaration


For compliance information for all Raspberry Pi products, see https://www.raspberrypi.org/compliance/

FCC
The Raspberry Pi High Quality Camera is in conformity with the requirements
of the following specifications:
FCC 47 CFR Part 15, Subpart B, Class B Digital Device.
This device complies with part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) This device may not cause harmful interference,
and (2) this device must accept any interference received, including interference
that may cause undesired operation.

5 Raspberry Pi High Quality Camera Getting Started


328

Safety information

IMPORTANT: PLEASE RETAIN THIS INFORMATION FOR FUTURE REFERENCE

WARNINGS
• This product should only be connected to and powered by a Raspberry Pi computer. Any external power
supply used with the Raspberry Pi should comply with relevant regulations and standards applicable in
the country of intended use.
• This product should be operated in a well ventilated environment and should not be covered.
• This product should be placed on a stable, flat, non-conductive surface while it is in use, and it should not
be contacted by conductive items.

INSTRUCTIONS FOR SAFE USE


To avoid malfunction of or damage to your Raspberry Pi High Quality Camera, please observe the following:
• Do not expose it to water or moisture, or place it on a conductive surface whilst in operation.
• Do not expose it to heat from any source; the Raspberry Pi High Quality Camera is designed for reliable
operation at normal ambient room temperatures.
• Take care whilst handling to avoid mechanical or electrical damage to the printed circuit board and
exposed connectors. Use a tripod with the device to minimise damage to the electronic components.
• Avoid handling the Raspberry Pi High Quality Camera while it is powered. Handle only by the edges or by
the lens mount assembly to minimise the risk of causing damage by electrostatic discharge.
• Take care not to damage any of the exposed electronics components. These are easily damaged if the
unit is dropped, and this is especially the case if a large lens is fitted.

For all compliance certificates and numbers, please visit https://www.raspberrypi.org/compliance/

6 Raspberry Pi High Quality Camera Getting Started


Chapter 25. 329

MIPI DSI and MIPI CSI are service marks of MIPI Alliance, Inc
Raspberry Pi and the Raspberry Pi logo are trademarks of the Raspberry Pi Foundation

www.raspberrypi.org
330

25.4. Data sheet Raspberry Pi High Quality Camera -


component drawing

10.2

5.02 10.16
1.2 4
0.8
5.8 1.4

2.5

22.4

38

30.75 8.5

36

7.62 13.97

38

Note: all dimensions in mm


Chapter 25. 331

25.5. Data sheet Raspberry Pi High Quality Camera -


Parameter 6 mm lens

镜头参数(Parameter of Lens)

型号(Model): PT361060M3MP12

像面尺寸(Image format) 1/2"

焦距 (Focal length) 6mm

分辨率 (Resolution) 3 MegaPixel

通光孔径(Aperture) F1.2

接口 (Mount) CS

视场角(Field Angle)D×H×V(°) 63°

最近物距 (M.O.D.) 0.2m

后焦距 (Back Focal Length) 7.53mm

外形尺寸 (Dimension) φ30×34mm

重量(Weight) 53g

操作方法 (Operation) 手动光圈(Manual)

尺寸(Size)

尺寸公差Size tolerance(mm): 10-30±0.10

角度公差Angle tolerance 30-120±0.20


332
Chapter 25. 333

25.6. Data sheet Raspberry Pi High Quality Camera -


Mounting the lenses

16 mm C-mount lens
The 16 mm lens provides a higher quality image
than the 6 mm lens. It has a narrow angle of view
which is more suited to viewing distant objects.

Fitting the C-CS adapter


Ensure the C-CS adapter that comes with the
High Quality Camera is fitted to the lens.
The lens is a C-mount device, so it has a
longer back focus than the 6 mm lens and
therefore requires the adapter.

Fitting the lens to the camera


Rotate the lens and C-CS adapter clockwise
all the way into the back focus adjustment ring.

Back focus adjustment ring


and lock screw
The back focus adjustment ring should be
screwed in fully. Use the back focus lock
screw to make sure it does not move out of
this position when adjusting the aperture or focus.

Aperture
To adjust the aperture, hold the camera with the
lens facing away from you. Turn the inner ring,
closest to the camera, while holding the camera
steady. Turn clockwise to close the aperture and
reduce image brightness. Turn anti-clockwise to
open the aperture. Once you are happy with the
light level, tighten the screw on the side of the
lens to lock the aperture into position.

Focus
To adjust focus, hold the camera with the
lens facing away from you. Turn the focus
ring, labelled “NEAR FAR”,
anti-clockwise to focus on a nearby
object. Turn it clockwise to focus on a
distant object. You may find you need to
adjust the aperture again after this.
334

6 mm CS-mount lens
A low-cost 6 mm lens is available for the High Quality
Camera. This lens is suitable for basic photography.
It can also be used for macro photography because it
can focus objects at very short distances.

Fitting the lens to the camera


The lens is a CS-mount device, so it has a short
back focus and does not need the C-CS adapter
that comes with the High Quality Camera.
Rotate the lens clockwise all the way into the
back focus adjustment ring.

Back focus adjustment ring


and lock screw
The back focus adjustment ring should be
screwed in fully for the shortest possible
back-focal length. Use the back focus lock
screw to make sure it does not move out
of this position when adjusting the aperture
or focus.

Aperture
To adjust the aperture, hold the camera with the lens
facing away from you. Turn the middle ring while
holding the outer ring, furthest from the camera, steady.
Turn clockwise to close the aperture and reduce image
brightness. Turn anti-clockwise to open the aperture.
Once you are happy with the light level, tighten the
screw on the side of the lens to lock the aperture.

Focus
To adjust focus, hold the camera with the lens facing
away from you. Hold the outer two rings of the lens;
this is easier if the aperture is locked as described
above. Turn the camera and the inner ring
anti-clockwise relative to the two outer rings to focus
on a nearby object. Turn them clockwise to focus on a
distant object. You may find you need to adjust the
aperture again after this.
Chapter 25. 335
Konnektivität (Anschluss, Verbindung)
Gewinde/Anschluss 1/4“ (6,4 mm), 3D, Ball
Physikalische Eigenschaften
Art Ministativ
Bein-Segmente 3-teilig (2-fach ausziehbar)
Haltegriff Keinen
Material Metall
Stativkopf 3D-Kugelkopf
Abmessung & Gewicht
Gewicht 125 g
Min. Höhe - Max. Höhe 14-26 cm
Anwendungsgebiet
Einsatz Foto, Video (3D)
Empfohlene Anwendung Foto-/Video
Gestaltung (Farbe, Design, Motiv, Serie)
Farbe Schwarz
Farbton Schwarz
Produktbereich Foto & Video
Serie Ball XL
ArtikelNummer 00004065
GTIN 4007249040657

Table 25.1.: Technische Daten; Quelle: hama

Figure 25.1.: Hama Mini-Stativ „Ball“, XL, Schwarz/Silber; Quelle: hama

25.7. Hama Mini-Stativ „Ball“, XL, Schwarz/Silber


Das Hama Mini-Stativ Ball L Dreibeinstativ besitzt drei Stativbeine mit jeweils
drei Segmenten. Diese sind in mehreren Winkeln verstellbar. Es verfügt über einen
3D-Kugelkopf zur Ausrichtung bei Aufnahmen im Hoch- und Querformat, ein 1/4
Zoll-Stativgewinde und eine Feststellschraube. Die Arbeitshöhe kann zwischen 14 und
26 cm eingestellt werden.
Literature
[ABF11] L. Akremi, N. Baur, and S. Fromm, eds. Datenanalyse mit SPSS für
Fortgeschrittene 1 – Datenaufbereitung und uni- und bivariate Statistik.
Vol. 3. Springer Fachmedien, 2011.
[AIT20] L. Artificial Intelligence Techniques, ed. OpenNN. 2020. url: https :
//www.opennn.net/.
[AR+16] R. Al-Rfou et al. “Theano: A Python framework for fast computation
of mathematical expressions”. In: arXiv e-prints abs/1605.02688 (May
2016). [pdf], [Link]. url: http://arxiv.org/abs/1605.02688.
[AY18] N. S. Artamonov and P. Y. Yakimov. “Towards Real-Time Traffic Sign
Recognition via YOLO on a Mobile GPU”. In: Journal of Physics :
Conference Series. 1096th ser. (2018). doi: 10.1088/1742-6596/1096/
1/012086.
[Aba+16] M. Abadi et al. “TensorFlow: A System for Large-Scale Machine Learning”.
In: 12th USENIX Symposium on Operating Systems Design and Imple-
mentation (OSDI 16). Savannah, GA: USENIX Association, Nov. 2016,
pp. 265–283. isbn: 978-1-931971-33-1. url: https://www.usenix.org/
conference/osdi16/technical-sessions/presentation/abadi.
[Aga18a] A. F. M. Agarap. “On Breast Cancer Detection: An Application of Ma-
chine Learning Algorithms on the Wisconsin Diagnostic Dataset”. In:
Proceedings of the 2nd International Conference on Machine Learning and
Soft Computing. ICMLSC ’18. Link, Link. Phu Quoc Island, Viet Nam:
Association for Computing Machinery, 2018, pp. 5–9. isbn: 9781450363365.
doi: 10.1145/3184066.3184080. url: https://doi.org/10.1145/
3184066.3184080.
[Aga18b] A. F. Agarap. “Deep Learning using Rectified Linear Units (ReLU)”. In:
(Mar. 22, 2018). doi: 10.48550/ARXIV.1803.08375. arXiv: 1803.08375
[cs.NE]. url: https://arxiv.org/pdf/1803.08375v2.
[Ala08] R. Alake. Implementing AlexNet CNN Architecture Using TensorFlow
2.0+ and Keras. 14.08.2020. url: https://towardsdatascience.com/
implementing- alexnet- cnn- architecture- using- tensorflow- 2-
0-and-keras-2113e090ad98.
[Alo+18] M. Z. Alom et al. The History Began from AlexNet: A Comprehensive
Survey on Deep Learning Approaches. Tech. rep. [pdf]. 2018. arXiv: 1803.
01164 [cs.CV].
[And35] E. Anderson. “The irises of the Gaspé Peninsula”. In: Bulletin of the
American Iris Society 59 (1935), pp. 2–5.
[Ani20] A. Anis. Getting Started with TensorFlow 2. Ed. by KDnuggets. 2020.
url: https : / / www . kdnuggets . com / 2020 / 07 / getting - started -
tensorflow2.html.
[Ard21] Arducam, ed. Featured Camera Modules Supported. 2021. url: https://
www.arducam.com/docs/usb- cameras/featured- camera- modules-
supported/ (visited on 06/16/2021).

337
338

[Aru18] Arunava. Convolutional Neural Network – Towards Data Science. 2018.


url: https : / / towardsdatascience . com / convolutional - neural -
network-17fb77e76c05 (visited on 11/08/2020).
[BA94] R. J. Brachman and T. Anand. “The Process of Knowledge Discovery in
Databases: A First Sketch”. In: AAAI Technical Repor WS-94-03 (1994).
[BS18] N. Brown and T. Sandholm. “Superhuman AI for heads-up no-limit poker:
Libratus beats top professionals”. In: Science 359.6374 (2018), pp. 418–
424. issn: 0036-8075. doi: 10.1126/science.aao1733. eprint: https:
//science.sciencemag.org/content/359/6374/418.full.pdf. url:
https://science.sciencemag.org/content/359/6374/418.
[BS19] P. Buxmann and H. Schmidt. Künstliche Intelligenz. Springer, 2019.
[Bal19] Balakreshnan. Inferencing Model - Jetson Nano Without Docker container.
Eingesehen am 20.04.2020 [online]. 2019. url: https://github.com/
balakreshnan/WorkplaceSafety/blob/master/InferencinginJetsonNano.
md#inferencing-model---jetson-nano-without-docker-container.
[Bec18a] F. Beck. Neuronale Netze – Eine Einführung – Aktivität. 2018. url:
http://www.neuronalesnetz.de/aktivitaet.html.
[Bec18b] R. Becker. Machine / Deep Learning – Wie lernen künstliche neuronale
Netze? Eingesehen am 27.01.2020 [online]. 2018. url: https://jaai.de/
machine-deep-learning-529/.
[Bec18c] R. Becker. Transfer Learning – So können neuronale Netze voneinander
lernen. Eingesehen am 30.01.2020 [online]. 2018. url: https://jaai.
de/transfer-learning-1739/.
[Bri15] D. Britz. Understanding Convolutional Neural Networks for NLP. 2015.
url: http://www.wildml.com/2015/11/understanding-convolutional-
neural-networks-for-nlp/.
[Bro+17] T. B. Brown et al. “Adversarial Patch”. In: Neural Information Processing
Systems - Machine Learning and Computer Security Workshop. [pdf]. 2017.
arXiv: 1712.09665 [cs.CV].
[Cha+00] P. Chapman et al. “CRISP-DM 1.0: Step-by-step data mining guide”.
In: (2000). url: https://www.semanticscholar.org/paper/CRISP-
DM-1.0%3A-Step-by-step-data-mining-guide-Chapman-Clinton/
54bad20bbc7938991bf34f86dde0babfbd2d5a72.
[Cha17] M. Chablani. “YOLO — You only look once, real time object detection ex-
plained”. In: Towards Data Science (2017). url: https://towardsdatascience.
com/yolo- you- only- look- once- real- time- object- detection-
explained-492dc9230006.
[Cho18] F. Chollet. Deep Learning mit Python und Keras: Das Praxis-Handbuch
vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co. KG,
2018.
[Cho+19] A. Chowdhery et al. Visual Wake Words Dataset. [pdf], [github] . 2019.
doi: 10.48550/ARXIV.1906.05721. arXiv: 1906.05721 [cs.CV]. url:
https://github.com/Mxbonn/visualwakewords.
[Cor20] N. Corporation, ed. CUDA. 2020. url: https://developer.nvidia.
com/cuda-zone.
[DK16] S. Dodge and L. Karam. Understanding How Image Quality Affects Deep
Neural Networks. 2016. url: https://arxiv.org/pdf/1604.04004.
pdf.
Literature 339

[DMM20] K. Dokic, M. Martinovic, and D. Mandusic. “Inference speed and quanti-


sation of neural networks with TensorFlow Lite for Microcontrollers frame-
work”. In: 2020 5th South-East Europe Design Automation, Computer
Engineering, Computer Networks and Social Media Conference (SEEDA-
CECNSM). [pdf]. 2020, pp. 1–6. doi: 10.1109/SEEDA- CECNSM49515.
2020.9221846.
[DV16] V. Dumoulin and F. Visin. “A guide to convolution arithmetic for deep
learning”. In: (Mar. 23, 2016). arXiv: 1603 . 07285 [stat.ML]. url:
https://arxiv.org/pdf/1603.07285v2.
[Den+09] J. Deng et al. “ImageNet: A large-scale hierarchical image database”.
In: 2009 IEEE Conference on Computer Vision and Pattern Recognition.
2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
[Den12] Deng, L. “The MNIST Database of Handwritten Digit Images for Ma-
chine Learning Research [Best of the Web]”. In: IEEE Signal Processing
Magazine 29.6 (2012). [pdf], Link, pp. 141–142.
[Dor13] C. F. Dormann. Datenanalyse mit SPSS für Fortgeschrittene 1 – Date-
naufbereitung und uni- und bivariate Statistik. Vol. 2. Springer-Verlag,
2013.
[Dör18] S. Dörn. “Neuronale Netze”. In: Programmieren für Ingenieure und Natur-
wissenschaftler. Springer, 2018, pp. 89–148.
[Düs00] R. Düsing. “Knowledge Discovery in Databases”. In: Wirtschaftsinformatik
42.1 (2000), pp. 74–75. issn: 1861-8936. doi: 10.1007/BF03250720.
[Ecl20] Eclipse, ed. Deep Learning for Java. 2020. url: https://deeplearning4j.
org/.
[Ert16] W. Ertel. Grundkurs Künstliche Intelligenz. Wiesbaden: Springer Fachme-
dien Wiesbaden, 2016. isbn: 978-3-658-13548-5. doi: 10.1007/978-3-
658-13549-2.
[FPSS96] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “From Data Mining to
Knowledge Discovery in Databases”. In: 17.3 (1996). [pdf], p. 37. doi:
10 . 1609 / aimag . v17i3 . 1230. url: https : / / www . aaai . org / ojs /
index.php/aimagazine/article/view/1230.
[FVP98] J. Funke and B. Vaterrodt-Pl"unnecke. Was ist Intelligenz? C.H. Beck
Wissen, 1998.
[Fac20] Facebook, ed. PyTorch. 2020. url: https://pytorch.org/.
[Fer12] D. A. Ferrucci. “Introduction to “This is Watson””. In: IBM Journal of
Research and Development 56.3.4 (2012), 1:1–1:15. doi: 10.1147/JRD.
2012.2184356.
[Fis] R. A. Fisher. “The use of Multiple Measurements inTaxonomic Problems”.
In: ().
[Fis36a] R. A. Fisher. Fisher’s Iris Dataset. [pdf]. 1936. url: https://archive.
ics.uci.edu/ml/datasets/iris.
[Fis36b] R. A. Fisher. Fisher’s Iris Dataset. [pdf]. 1936. doi: 10.24432/C56C76.
url: https://archive.ics.uci.edu/ml/datasets/iris.
[Fol+11] M. Folk et al. “An overview of the HDF5 technology suite and its applica-
tions”. In: EDBT/ICDT 2011. 2011, pp. 36–47. doi: 10.1145/1966895.
1966900.
[Fou20a] P. S. Foundation, ed. glob — Unix style pathname pattern expansion.
Eingesehen am 15.12.2019 [online]. 2020. url: https://docs.python.
org/3/library/glob.html.
340

[Fou20b] P. S. Foundation, ed. numpy. Eingesehen am 15.12.2019 [online]. 2020.


url: https://numpy.org/.
[Fou20c] P. S. Foundation, ed. python-pickle-module-save-objects-serialization. Einge-
sehen am 17.12.2019 [online]. 2020. url: https://pythonprogramming.
net/python-pickle-module-save-objects-serialization/.
[Fou20d] P. S. Foundation, ed. random. Eingesehen am 16.12.2019 [online]. 2020.
url: https://docs.python.org/3/library/random.html.
[Fou21a] P. S. Foundation, ed. The Python Package Index. 2021. url: https :
//pypi.org.
[Fou21b] P. S. Foundation, ed. pycocotools. 2021. url: https : / / pypi . org /
project/pycocotools.
[GC06] P. Gluchowski and P. Charmoni. Analytische Informationssysteme: Busi-
ness Intelligence-Technologien und -Anwendungen. Vol. 3. 2006.
[GMG16] P. Gysel, M. Motamedi, and S. Ghiasi. “Hardware-oriented approximation
of convolutional neural networks”. In: arXiv preprint arXiv:1604.03168
(2016).
[GSS14] I. J. Goodfellow, J. Shlens, and C. Szegedy. “Explaining and Harnessing
Adversarial Examples”. In: arXiv 1412.6572 (2014). [pdf]. arXiv: 1412.
6572 [stat.ML].
[GSt16] GStreamer.org, ed. GStreamer. 2016. url: https://gstreamer.freedesktop.
org/.
[Goo10] Google, ed. Beginnen Sie mit TensorBoard. 30.10.2020. url: https :
//www.tensorflow.org/tensorboard/get_started.
[Goo11a] Google, ed. TensorFlow Lite converter. 4.11.2020. url: https://www.
tensorflow.org/lite/convert.
[Goo11b] Google, ed. Coral: Build beneficial and privacy preserving AI. 14/11/2019.
url: https://coral.withgoogle.com/.
[Goo19a] Google, ed. TensorFlow models on the Edge TPU. 2019. url: https:
//coral.withgoogle.com/docs/edgetpu/models-intro/.
[Goo19b] Google, ed. TensorFlow. 2019. url: https://www.tensorflow.org/.
[Goo19c] Google, ed. TensorFlow. 2019. url: https://www.tensorflow.org/
lite.
[Goo20a] Google, ed. Beginnen Sie mit TensorFlow Lite. 2020. url: https://www.
tensorflow.org/lite/guide/get_started#1_choose_a_model.
[Goo20b] Google, ed. TensorFlow - Convolutional Neural Network Example. 2020.
url: \url{https://www.tensorflow.org/tutorials/images/cnn}.
[Goo20c] Google, ed. TensorFlow Lite Guide. 2020. url: \url{https : / / www .
tensorflow.org/lite/guide}.
[Goo20d] Google, ed. tf.keras.datasets.cifar100.load_data. 2020. url: https://www.
tensorflow.org/api_docs/python/tf/keras/datasets/cifar100/
load_data.
[Gu+18] J. Gu et al. “Recent Advances in Convolutional Neural Networks”. In:
Pattern Recognition 77 (2018), pp. 354–377. issn: 0031-3203. doi: 10.
1016/j.patcog.2017.10.013. url: http://arxiv.org/pdf/1512.
07108v6.
[Gup20] D. Gupta. Activation Functions – Fundamentals Of Deep Learning. 2020.
url: https://www.analyticsvidhya.com/blog/2020/01/fundamentals-
deep-learning-activation-functions-when-to-use-them/.
Literature 341

[Gö] U. Göttingen, ed. Kapitel 3: Erste Schritte der Datenanalyse. url: https:
//www.uni-goettingen.de/de/document/download/9b4b2033cba125be183719130e524467.
pdf/mvsec3.pdf.
[HDC20] L. Huawei Device Co., ed. Huawei Kirin 990. https://consumer.huawei.
com/en/campaign/kirin-990-series/Huawei Kirin 990. 2020.
[Hdf] Hierarchical Data Format, version 5. 1997. url: http://www.hdfgroup.
org/HDF5/.
[He+16] K. He et al. “Deep Residual Learning for Image Recognition”. In: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). [pdf]. 2016. doi: 10.1109/cvpr.2016.90.
[How+] A. G. Howard et al. MobileNets: Efficient Convolutional Neural Networks
for Mobile Vision Applications. [pdf]. url: http://arxiv.org/pdf/
1704.04861v1.
[Int19a] Intel Corporation, ed. Intel Neural Compute Stick 2 - Data Sheet. . .
/IntelNCS2 / ExterneDokumente / Intel / NCS2 _ Datasheet - English .
pdfpdf. 2019. url: https://ark.intel.com/content/www/de/de/
ark/products/140109/intel-neural-compute-stick-2.html.
[Int19b] Intel Corporation, ed. OpenVino - System Requirements. 2019. url: https:
/ / software . intel . com / content / www / us / en / develop / tools /
openvino-toolkit/system-requirements.html.
[JES20] Y. Jia and E. Evan Shelhamer. Caffe. Ed. by B. A. I. Research. 2020.
url: http://caffe.berkeleyvision.org/.
[JL07] J. Janssen and W. Laatz. Statistische Datenanalyse mit SPSS für Windows.
Vol. 6. Springer-Verlag, 2007.
[Jai+20] A. Jain et al. Efficient Execution of Quantized Deep Learning Models: A
Compiler Approach. 2020. doi: arxiv-2006.10226. arXiv: 2006.10226
[cs.DC]. url: http://arxiv.org/pdf/2006.10226v1.
[Jet19] Jetsonhacks. Jetson Nano GPIO. 2019. url: https://www.jetsonhacks.
com/2019/06/07/jetson-nano-gpio/.
[Jun+19] B. Junjie et al. ONNX: Opern Neural Network Exchange. 2019. url:
https://github.com/onnx/onnx.
[KH09] A. Krizhevsky and G. Hinton. Learning Multiple Layers of Features from
Tiny Images. 2009. url: https : / / www . cs . toronto . edu / ~kriz /
learning-features-2009-TR.pdf.
[KK07] I. Kononenko and M. Kukar, eds. Machine Learning and Data Mining.
Woodhead Publishing, 2007. isbn: 978-1-904275-21-3.
[KM06] L. A. Kurgan and P. Musilek. “A Survey of Knowledge Discovery and Data
Mining Process Models”. In: The Knowledge Engineering Review 21.1
(2006), pp. 1–24. issn: 0269-8889. doi: 10.1017/S0269888906000737.
[KSH12a] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet Classifica-
tion with Deep Convolutional Neural Networks”. In: Advances in Neural
Information Processing Systems. Ed. by F. Pereira et al. Vol. 25. Cur-
ran Associates, Inc., 2012, pp. 1097–1105. doi: 0.1145/3065386. url:
proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-
Paper.pdf.
[KSH12b] A. Krizhevsky, I. Sutskever, and G. Hinton. “ImageNet Classification with
Deep Convolutional Neural Networks”. In: Neural Information Processing
Systems 25 (Jan. 2012). doi: 10 . 1145 / 3065386. url: http : / / www .
cs . toronto . edu / ~kriz / imagenet _ classification _ with _ deep _
convolutional.pdf.
342

[KSH17] A. Krizhevsky, I. Sutskever, and G. E. Hinton. CIFAR-10 and CIFAR-


100 datasets. 2017. url: https://www.cs.toronto.edu/~kriz/cifar.
html.
[Kaga] CIFAR-100 Python. 2020. url: https://www.kaggle.com/datasets/
fedesoriano/cifar100.
[Kagb] Iris Flower Dataset. 2018. url: https://www.kaggle.com/arshid/
iris-flower-dataset.
[Kha20] Khandelwal, Renu. A Basic Introduction to TensorFlow Lite. accessed
date 20.04.2022. 2020. url: https : / / towardsdatascience . com / a -
basic-introduction-to-tensorflow-lite-59e480c57292.
[Kri08] D. Kriesel. Ein kleiner Überblick über Neuronale Netze. Eingesehen am
28.12.2019 [online]. 2008. url: http://www.dkriesel.com/index.php.
[Kru+15] R. Kruse et al. Computational Intelligence. Wiesbaden: Springer Fachme-
dien Wiesbaden, 2015. isbn: 978-3-658-10903-5. doi: 10.1007/978-3-
658-10904-2.
[Kum19] T. Kummarikuntla. Camera Calibration with OpenCV. 2019. url: https:
/ / medium . com / analytics - vidhya / camera - calibration - with -
opencv-f324679c6eb7.
[Kur21] A. Kurniawan. IoT Projects with NVIDIA Jetson Nano: AI-Enabled
Internet of Things Projects for Beginners. Berkeley, CA: Apress, 2021.
isbn: 978-1-4842-6452-2. doi: 10.1007/978- 1- 4842- 6452- 2 _ 1. url:
https://doi.org/10.1007/978-1-4842-6452-2_1.
[Kä11] W.-M. Kähler. Statistische Datenanalyse: Verfahren verstehen und mit
SPSS gekonnt einsetzen. Vol. 7. Springer-Verlag, 2011.
[LCB13] Y. LeCun, C. Cortes, and C. Burges. MNIST handwritten digit database.
2013. url: http://yann.lecun.com/exdb/mnist/.
[LP20] López de Lacalle, Luis Norberto and J. Posada. “New Industry 4.0 Ad-
vances in Industrial IoT and Visual Computing for Manufacturing Pro-
cesses”. In: (2020).
[LXW19] Y. Lu, S. Xie, and S. Wu. “Exploring Competitive Features Using Deep
Convolutional Neural Network for Finger Vein Recognition”. In: IEEE
Access PP (Mar. 2019). [pdf], pp. 1–1. doi: 10 . 1109 / ACCESS . 2019 .
2902429.
[LeC+98] Y. LeCun et al. “Gradient-based learning applied to document recognition”.
In: Proceedings of the IEEE 86.11 (1998), pp. 2278–2324. doi: 10.1109/
5.726791.
[Lin+14] T. Lin et al. “Microsoft COCO: Common Objects in Context”. In: The
Computing Research Repository (CoRR) abs/1405.0312 (2014). arXiv:
1405.0312. url: http://arxiv.org/abs/1405.0312.
[Lin+21] T.-Y. Lin et al. COCO - Common Objects in Context. 2021. url: https:
//cocodataset.org/.
[MP43] W. S. McCulloch and W. Pitts. “A Logical Calculus of the Ideas Immanent
in Nervous Activity”. In: The Bulletin of Mathematical Biophysics 5.4
(1943), pp. 115–133. issn: 1522-9602. doi: 10.1007/BF02478259.
[MR10] O. Maimon and L. Rokach. Data Mining and Knowledge Discovery Hand-
book. Boston, MA: Springer US, 2010. isbn: 978-0-387-09822-7. doi: 10.
1007/978-0-387-09823-4.
[MXN20] A. S. F. A. MXNet, ed. MXNet. 2020. url: https://mxnet.apache.
org/.
Literature 343

[McC+06] J. McCarthy et al. “A Proposal for the Dartmouth Summer Research


Project on Artificial Intelligence, August 31, 1955”. In: AI Magazine 27.4
(2006). doi: doi.org/10.1609/aimag.v27i4.1904.
[Mck] “Smartening up with Artificial Intelligence (AI) - What’s in it for Germany
and its Industrial Sector?” In: McKinsey & Company, 2017.
[Meu20] R. Meude. TensorFlow 2.0 Tutorial: Optimizing Training Time Perfor-
mance. Ed. by KDnuggets. 2020. url: https://www.kdnuggets.com/
2020 / 03 / tensorflow - optimizing - training - time - performance .
html.
[Mic19] U. Michelucci. Advanced Applied Deep Learning. Berkeley, CA: Apress,
2019. isbn: 978-1-4842-4975-8. doi: 10.1007/978-1-4842-4976-5.
[Mic20] Microsoft, ed. The Microsoft Cognitive Toolkit. 2020. url: https://docs.
microsoft.com/en-us/cognitive-toolkit/.
[Moe18] J. Moeser. Künstliche neuronale Netze - Aufbau & Funktionsweise. 2018.
url: https : / / jaai . de / kuenstliche - neuronale - netze - aufbau -
funktion-291/.
[Mon17] S. Monk. Raspberry Pi Kochbuch. Heidelberg: O’Reilly, 2017.
[Mun96] D. C. Munson. “A note on Lena”. In: IEEE Transactions on Image
Processing 5.1 (1996), pp. 3–3. doi: 10.1109/TIP.1996.8100841.
[NVI15] NVIDIA Corporation, ed. NVIDIA TensorRT. 2015. url: \url{https:
//developer.nvidia.com/tensorrt}.
[NVI19] NVIDIA Corporation, ed. Getting Started With Jetson Nano Developer Kit.
2019. url: https://developer.nvidia.com/embedded/learn/get-
started-jetson-nano-devkit.
[NVI20a] NVIDIA Corporation, ed. Jetson Community Projects. 2020. url: https:
//developer.nvidia.com/embedded/community/jetson-projects.
[NVI20a] NVIDIA, ed. Installing TensorFlow For Jetson Platform :: NVIDIA Deep
Learning Frameworks Documentation. 2020. url: https://docs.nvidia.
com / deeplearning / frameworks / install - tf - jetson - platform /
index.html.
[NVI20b] NVIDIA Corporation, ed. Jetson Nano Developer Kit - User Guide. 2020.
url: https://developer.nvidia.com/embedded/jetpack.
[NVI20b] NVIDIA. jetson-gpio. 2020. url: https : / / github . com / NVIDIA /
jetson-gpio.
[NVI20c] NVIDIA Corporation, ed. NVIDIA CUDA Tolkit 11.0.161. [pdf]. 2020.
[NVI21a] NVIDIA Corporation, ed. Jetson Nano Developer Kit - JetPack SD Card
Image. 2021. url: https://developer.nvidia.com/embedded/learn/
get-started-jetson-nano-devkit.
[NVI21b] NVIDIA Corporation, ed. NVIDIA JetPack. 2021. url: \url{https :
//developer.nvidia.com/embedded/jetpack}.
[Nam17] I. Namatevs. “Deep Convolutional Neural Networks: Structure, Feature
Extraction and Training”. In: Information Technology and Management
Science 20.1 (2017). doi: 10.1515/itms-2017-0007.
[Nie15] M. A. Nielsen. Neural Networks and Deep Learning. Determination Press,
2015. url: http : / / neuralnetworksanddeeplearning . com / chap1 .
html.
[Nwa+18] C. Nwankpa et al. “Activation Functions: Comparison of trends in Practice
and Research for Deep Learning”. In: (Nov. 8, 2018). arXiv: 1811.03378
[cs.LG]. url: https://arxiv.org/pdf/1811.03378v1.pdf.
344

[Onn] ONNX Runtime. 2021. url: https://onnxruntime.ai/.


[PN20] I. Preferred Networks, ed. Chainer. 2020. url: https://chainer.org/.
[PUC02] S. Pyo, M. Uysal, and H. Chang. “Knowledge Discovery in Database
for Tourist Destinations”. In: Journal of Travel Research 40.4 (2002),
pp. 374–384. issn: 0047-2875. doi: 10.1177/0047287502040004006.
[Ped+11] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal
of Machine Learning Research 12 (2011), pp. 2825–2830.
[QBL18] H. Qi, M. Brown, and D. G. Lowe. “Low-Shot Learning with Imprinted
Weights”. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. 2018, pp. 5822–5830.
[RA20] c’t Redaktion (Autor). “c’t Python-Projekte”. In: (2020).
[Raj20] A. Raj. “AlexNet CNN Architecture on Tensorflow (beginner)”. In: (2020).
Ed. by Kaggle. url: https://www.kaggle.com/vortexkol/alexnet-
cnn-architecture-on-tensorflow-beginner.
[Ras16] Raspberry Pi Foundation, ed. Raspberry Pi Camera Module V2. 2016.
url: https://www.raspberrypi.org/products/camera-module-v2/
(visited on 11/14/2019).
[Ras19] Raspberry Pi Foundation, ed. Raspberry Pi 3 Model B+. 2019. url:
https://www.raspberrypi.org/products/raspberry-pi-3-model-
b-plus (visited on 11/14/2019).
[Red+15] J. Redmon et al. You Only Look Once: Unified, Real-Time Object Detection.
[pdf]. 2015. doi: 10.48550/ARXIV.1506.02640. url: https://arxiv.
org/pdf/1506.02640.pdf (visited on 02/17/2021).
[Red20] J. Redmon. MNIST in CSV. 2020. url: https : / / pjreddie . com /
projects/mnist-in-csv/.
[Red21] J. Redmon. YOLO: Real-Time Object Detection. 2021. url: https://
pjreddie.com/darknet/yolo/.
[Ric83] E. Rich. Artificial Intelligence. McGraw-Hill Inc.,US, 1983. isbn: 9780070522619.
[Run15] T. A. Runkler. Data Mining – Modelle und Algorithmen intelligenter
Datenanalyse. Vol. 2. Springer Fachmedien, 2015.
[SC21] J. Stock and T. Cavey. Who’s a Good Boy? Reinforcing Canine Behavior
in Real-Time using Machine Learning. 2021. arXiv: 2101.02380 [cs.CV].
url: https://arxiv.org/pdf/2101.02380.pdf.
[SH06] L. Sachs and J. Hedderich. Angewandte Statistik – Methodensammlung
mit R. Vol. 12. Springer-Verlag, 2006.
[SIG20a] K. SIG, ed. Keras Dense Layer. 2020. url: \url{https://keras.io/
api/layers/core_layers/dense/}.
[SIG20b] K. SIG, ed. Keras. 2020. url: https://keras.io/.
[SJM18] N. Sharma, V. Jain, and A. Mishra. “An Analysis Of Convolutional Neural
Networks For Image Classification”. In: Procedia Computer Science 132
(2018), pp. 377–384. issn: 18770509. doi: 10.1016/j.procs.2018.05.
198.
[SKP15] F. Schroff, D. Kalenichenko, and J. Philbin. “FaceNet: A Unified Embed-
ding for Face Recognition and Clustering”. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). [pdf].
2015, pp. 815–823.
[SM20] K. Sadekar and S. Mallick. Camera Calibration using OpenCV. 2020. url:
https://learnopencv.com/camera-calibration-using-opencv/.
Literature 345

[SP06] V. Sahni and C. Patvardhan. “Iris Data Classification Using Quantum


Neural Networks”. In: AIP Conference Proceedings 864.1 (2006), pp. 219–
227. doi: 10.1063/1.2400893. eprint: https://aip.scitation.org/
doi/pdf/10.1063/1.2400893. url: https://aip.scitation.org/
doi/abs/10.1063/1.2400893.
[SSS19] F. Siddique, S. Sakib, and M. A. B. Siddique. “Recognition of Handwritten
Digit using Convolutional Neural Network in Python with Tensorflow
and Comparison of Performance for Various Hidden Layers”. In: 2019 5th
International Conference on Advances in Electrical Engineering (ICAEE).
[pdf]. 2019, pp. 541–546.
[SW16] M. Schutten and M. Wiering. “An Analysis on Better Testing than
Training Performances on the Iris Dataset”. In: Belgian Dutch Artificial
Intelligence Conference. Link, Link. Sept. 2016.
[SZ15] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for
Large-Scale Image Recognition. [pdf]. 2015. arXiv: 1409.1556 [cs.CV].
[Sah18] S. Saha. A Comprehensive Guide to Convolutional Neural Networks –
the ELI5 Way. 2018. url: https : / / towardsdatascience . com / a -
comprehensive- guide- to- convolutional- neural- networks- the-
eli5-way-3bd2b1164a53 (visited on 01/25/2018).
[Sam20] Samsung, ed. Samsung Exynos-980. https : / / www . samsung . com /
semiconductor / minisite / exynos / products / mobileprocessor /
exynos-980/Samsung Exynos-980. 2020.
[San+18] M. Sandler et al. “MobileNetV2: Inverted Residuals and Linear Bottle-
necks”. In: arXiv e-prints, arXiv:1801.04381 (2018). [pdf]. arXiv: 1801.
04381 [cs.CV].
[Sha+16] W. Shang et al. “Understanding and Improving Convolutional Neural
Networks via Concatenated Rectified Linear Units”. In: International
Conference on Machine Learning (ICML) (Mar. 2016). [pdf], pp. 2217–
2225.
[Sha18] S. Sharma. Aktivierungsfunktionen: Neuronale Netze. Ed. by AI-United.
2018. url: http://www.ai- united.de/aktivierungsfunktionen-
neuronale-netze/.
[Sha+20] V. Shankar et al. “Evaluating Machine Accuracy on ImageNet”. In: Pro-
ceedings of the 37th International Conference on Machine Learning. Ed.
by H. D. III and A. Singh. Vol. 119. Proceedings of Machine Learning
Research. [pdf]. PMLR - Proceedings of Machine Learning Research,
2020, pp. 8634–8644. url: http://proceedings.mlr.press/v119/
shankar20c.html.
[Sit+18] C. Sitawarin et al. DARTS: Deceiving Autonomous Cars with Toxic Signs.
[pdf]. 2018. arXiv: 1802.06430 [cs.CR].
[Sou20] F. O. Source, ed. Caffe2. 2020. url: https://caffe2.ai/.
[Sta19] Stack Overflow Documentation, ed. Learning TensorFlow. [pdf]. 2019.
url: riptutorial.com/tensorflow.
[Sze+14] C. Szegedy et al. Going Deeper with Convolutions. [pdf]. 2014. doi: 10.
48550/ARXIV.1409.4842. arXiv: 1409.4842 [cs.CV].
[TL20] M. Tan and Q. V. Le. EfficientNet: Rethinking Model Scaling for Convo-
lutional Neural Networks. [pdf]. 2020. arXiv: 1905.11946 [cs.LG].
[TS19] S. Tan and A. Sidhu. “CUDA implementation”. In: Apr. 2019, pp. 63–67.
isbn: 978-3-030-17273-2. doi: 10.1007/978-3-030-15585-8_5.
346

[Tea20a] O. Team, ed. OpenCV. 2020. url: https://opencv.org/.


[Tea20b] O. Team, ed. OpenVino. 2020. url: https://www.openvino.org/.
[Ten21] TensorFlow. tf.lite.Optimize. 2021. url: https : / / www . tensorflow .
org/api_docs/python/tf/lite/Optimize.
[UK21] A. Unwin and K. Kleinman. “The Iris Data Set: In Search of the Source
of Virginica”. In: Significance 18 (2021). [pdf]. doi: 10 . 1111 / 1740 -
9713.01589.
[Var20] P. Varshney. AlexNet Architecture: A Complete Guide. 2020. url: https:
//www.kaggle.com/code/blurredmachine/alexnet-architecture-
a-complete-guide (visited on 07/26/2023).
[Vin18] J. Vincent. “Google unveils tiny new AI chips for on-device machine
learning”. In: The Verge (2018). url: https://www.theverge.com/
2018/7/26/17616140/google- edge- tpu- on- device- ai- machine-
learning-devkit.
[WS20] P. Warden and D. Situnayake. TinyML – Machine Learning with Ten-
sorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers. [pdf].
O’Reilly Media, Incorporated, 2020, p. 504. isbn: 978-1-492-05204-3.
[Wan+16] F. Wang et al. “Where does AlphaGo go: from church-turing thesis to
AlphaGo thesis and beyond”. In: IEEE/CAA Journal of Automatica
Sinica 3.2 (2016), pp. 113–120. doi: 10.1109/JAS.2016.7471613.
[Web97] A. Weber. The USC-SIPI Image Database: Version 5. link. 1997. url:
http://sipi.usc.edu/services/database/Database.html.
[Wik20] Wikipedia, ed. OpenCV. Eingesehen am 16.12.2019 [online]. 2020. url:
https://de.wikipedia.org/wiki/OpenCV.
[Wro98] S. Wrobel. “Data Mining und Wissensentdeckung in Datenbanken”. In:
Künstliche Intelligenz 1 (1998). [pdf]. url: ftp://ftp.fhg.de/archive/
gmd/ais/publications/1998/wrobel.98.data-mining.pdf.
[ZJ17] W. Zhiqiang and L. Jun. “A review of object detection based on convolu-
tional neural network”. In: 2017 36th Chinese Control Conference (CCC).
2017, pp. 11104–11109. doi: 10.23919/ChiCC.2017.8029130.
[Zha+15] X. Zhang et al. Accelerating Very Deep Convolutional Networks for Clas-
sification and Detection. [pdf]. 2015. arXiv: 1505.06798 [cs.CV].
[Zie15] W. Ziegler. Neuronale Netze – Teil 2. Eingesehen am 26.01.2020 [online].
2015. url: https : / / entwickler . de / online / neuronale - netze -
teil-2-159753.html.
[fas20] fast.ai, ed. FastAI. 2020. url: https://www.fast.ai/.
Index
Libraries, 77 skikitlearn, 67
SSHFS-Win, 200
File t10k-images-idx3-ubyte.gz, 65
(env), 145 t10k-labels-idx1-ubyte.gz, 65
.ckpt, 191 TensorFlow, 190
.tflite, 201 tflite-camera-button.py, 212
/home/msr/python-envs/env, 145 train-images-idx3-ubyte.gz, 65
Anaconda, 147 train-labels-idx1-ubyte.gz, 65
Anaconda Navigator, 147 Training_CIFAR.py, 184
Anaconda Navigator-It, 147 virtualenv, 145, 146
Anaconda Prompt-It, 146 wheel, 145
base, 149 winfsp, 200
C:\Users\<username>\.keras\datasets,
66 AlexNet, 53, 54, 76, 211
conda, 145 Alom:2018, 53
Conversion.py, 201 Amazon, 79
Conversion_Opt.py, 201 API, 15, 78, 84
deactivate, 145 Application Programming Interface
Distortion_Camera_Calibration_OpenCV.tex,
see API, 15, 78
114
env, 145 backpropagation, 54
file_name, 72 Basemap, 81
flask, 146 Bokeh, 81
Git Bash, 145
h5, 191 Caffee, 79
h5py, 190 Caffee2, 79
Homebrew, 145 Canadian Institute For Advanced Research
imagenet-camera.py, 76 see CIFAR
Keras, 176 see Datensatz, 11, 12, 15, 60
keras, 190 Canadian Institute for Advanced Research,
LinuxBrew, 145, 146 65
logfiles, 184 Central Processing Unit
MyProject, 145–147 see CPU, 15, 82
new-env, 149 Chainer, 79
numpy, 146, 162, 176, 180, 201 CIFAR-10, 201
OpenMP runtime, 194 CIFAR-100, 201
pandas, 162 CNN, 4, 15, 32, 45–49, 52, 53, 55, 58–61,
pip, 145, 154 155
pipenv, 145, 146 CNTK, 79
Pipfile, 146 COCO, 15, 70–75, 84
pipfile, 145 Common Objects in Context
pyplot, 162, 176 see COCO, 15, 70
python-envs, 145 Compute Unified Device Architecture
python3-venv, 145 see CUDA, 15, 77
requests, 146 Convolutional Neural Network
requirement.txt, 145 see CNN, 4, 15, 32
saved_model.pb, 206 CPU, 15, 82, 192, 193, 199

347
348

CUDA, 15, 77, 78, 82 jetson-inference, 75


cv2, 83 Jupyter-Notebook, 77

Dataset Keras, 78
CIFAR-10, 65, 201, 206
CIFAR-100, 65, 201, 206 LeNet, 53
CIFAR10, 201 LightGBM, 81
CIFAR100, 201
CIIFAR-10, 211 math, 83
matplotlib, 80
Fisher’s Iris Data Set, 67–70, 162,
Microsoft Cognitive Toolkit, 79
164–167, 169, 172
Mlpy, 82
ImageNet, 75
MNIST, 155, 156
MNIST, 64, 65
MobileNet, 54
TensorFlow, 65
MobileNetV2, 54
dataset
MonileNet, 55
CIFAR-10, 206
MXNet, 79
Datensatz
CIFAR, 11, 12, 15, 60, 62, 65–67, 201, NetworkX, 81
206, 211 NIST Special Database 1
NIST Special Database 3, 15, 64 see Datensatz, 15, 64
NIST Special Database 1, 15, 64, 65 NIST Special Database 3
Deeplearning4J, 79 see Datensatz, 15, 64
DeepScene, 76 NLTK, 81
NumPy, 80
EfficientNet, 54
Eli5, 82 Object detection, 52
OpenCV, 82
FastAI, 79 OpenNN, 83
feature maps, 56 OpenVINO, 82
Frameworks, 77 os, 83
General Purpose Input Output Pandas, 80
see GPIO, 6, 15, 98 Pattern, 81
glob, 83 pickle, 84
GoogleNet, 76 PIL, 83
GPIO, 6, 15, 98, 99, 109 PyPI, 84
GPU, 15, 53, 79, 82, 192–194, 199 Python Software Foundation, 84
Graphics Processing Unit PyTorch, 78
see GPU, 15, 53
random, 83
hyperparameter, 55 Rectified Linear Unit
hyperparameters, 55 see ReLu, 15, 58
ReLu, 15, 58, 62
I2 C, 15, 99 Residual Neural Network
I2 S, 15, 99 see ResNet, 15, 54
Image classification, 52 ResNet, 15, 54, 55, 75
ImageNet, 55, 76 ResNets, 76
see Dataset 75 RLE, 15, 74
Inception, 55, 76 Run-Length Encoding, 15, 74
InceptionNet, 54
Inter-IC Sound SBC, 15, 95
see I2 S, 15, 99 scikit-learn, 80
Inter-Integrated Circuit SciPy, 80
see I2 C, 15, 99 SCL, 15, 99
IPython, 80 Scrapy, 81
Index 349

SDA, 15, 99
Seaborn, 81
Secure Shell
see SSH, 15, 107
segmentation, 52
Serial Clock
see SCL, 15, 99
Serial Data
see SDA, 15, 99
Serial Peripheral Interface
see SPI, 15, 99
Single-Board-Computer
see SBC, 15, 95
Speicherprogrammierbare Steuerung
see SPS, 15, 93
SPI, 15, 99
SPS, 15, 93
SSD-Mobilenet-V2, 76
SSH, 15, 107, 108
Statsmodels, 82

TensorBoard, 158, 159


TensorFlow, 77
TensorFlow Lite, 78
TensorRT, 78
Theano, 79

UART, 15, 98, 99


UCI Machine Learning Repository, 67
Universal Asynchronous Receiver Trans-
mitter
see UART, 15, 98

VGG, 54, 55, 76


Video Processing Unit
see VPU, 15, 82
VPU, 15, 82

You might also like