Professional Documents
Culture Documents
Chapter 34 Chapter 34 Experiences on Image and Video Processing with CUDA and OpenCL
Tim Child October 2011
Outline
Authors Chapter Contents
Problem Statement Technology or Algorithm Background Subtraction Pearson's Correlation Coefficient Key Insights Case Study 1: R-T Video Background Subtraction Case Study 2: Cross Correlation Final Evaluation
Authors
Alptekin Temizel Tugba Halici Berker Logoglu Tugba Taskaya Temizel Faith Ormruuzun Ersin Karaman
Problem Statement
Guide Users in Implementing GPU Algorithms
Using CUDA and OpenCL Video and Image Processing Comparison of CUDA and OpenCL
Technology
Hardware Set-up Descriptions
4 GPU Configurations
(3 CUDA, 1 OpenCL)
1 CPU Configuration
(OpenMP on 4 Cores)
Algorithms
1. Background Subtraction Algorithm
Find moving Pixels Updating the Background Updating the Motion Threshold
Background Subtraction
Algorithm
Find Moving Pixels Compare Pixels from prior 2 images
In(x,y) - In-1(x,y) > Tn(x,y) && In(x,y) - In-2(x,y) > Tn(x,y)
Bn+1 =
Updating Threshold
Background Subtraction
-1
Experiment 2
Single vs. 3 Kernels Using multiple kernel decrease perf by 43% (due to independent memory access )
Experiment 3
Single Kernel Serial/Async I/O Async I/O Increased Performance 61% - 91%
Experiment 4
8 bit vs 32 Bit Global Memory Access 32bit access Increased Performance 8%- 29%
Experiment 2
Global Memory & Shared Memory with Coalesced Access
Key Insights
CUDA faster than OpenCL* Speed-up between 11.6x to 89x possible Larger image sizes have more speed-up I/O is significant and needs to be optimized Efficient GPU memory access is vital Using more kernels increase memory access and reduces performance
Reviewers Conclusions
Useful experiments Valid conclusions make sense Sometimes not clear what the baseline is Hardware is now dated
Q&A