You are on page 1of 21

Book Review

Chapter 34 Chapter 34 Experiences on Image and Video Processing with CUDA and OpenCL
Tim Child October 2011

Outline
Authors Chapter Contents
Problem Statement Technology or Algorithm Background Subtraction Pearson's Correlation Coefficient Key Insights Case Study 1: R-T Video Background Subtraction Case Study 2: Cross Correlation Final Evaluation

Reviewers Conclusions Q&A

Authors
Alptekin Temizel Tugba Halici Berker Logoglu Tugba Taskaya Temizel Faith Ormruuzun Ersin Karaman

Problem Statement
Guide Users in Implementing GPU Algorithms
Using CUDA and OpenCL Video and Image Processing Comparison of CUDA and OpenCL

Multiple Architectures Studied


Advantages of each Video and Image Processing

Technology
Hardware Set-up Descriptions
4 GPU Configurations
(3 CUDA, 1 OpenCL)

1 CPU Configuration
(OpenMP on 4 Cores)

Algorithms
1. Background Subtraction Algorithm
Find moving Pixels Updating the Background Updating the Motion Threshold

2. Pearsons Correlation Coefficient


PMCC = cov( x, y ) x y

Background Subtraction
Algorithm
Find Moving Pixels Compare Pixels from prior 2 images
In(x,y) - In-1(x,y) > Tn(x,y) && In(x,y) - In-2(x,y) > Tn(x,y)

Update the Background

Bn+1 =

Bn(x,y) + (1-)In(x,y) Bn(x,y)

Updating Threshold

Background Subtraction

Pearsons Correlation Coefficient


PMCC values lie between [1,-1]
Value 1 0 Description Perfect match No correlation

-1

Perfect negative correlation

Case Study 1 R-T Video Subtraction


Measure speed-up for various image sizes, architectures Experiment 1
Single Kernel Best speed up 5.6x

Experiment 2
Single vs. 3 Kernels Using multiple kernel decrease perf by 43% (due to independent memory access )

Experiment 3
Single Kernel Serial/Async I/O Async I/O Increased Performance 61% - 91%

Experiment 4
8 bit vs 32 Bit Global Memory Access 32bit access Increased Performance 8%- 29%

CS 1 Exp 1Speed-Up W/O I/O

CS 1 Exp 1 Speed-Up With I/O

CS 1 Exp 2 Multiple Kernels

CS 1 Exp3,4 Async I/O 8 bit vs 32 bit

Case Study 2 Cross Correlation


Compute PMCC between Images Experiment 1
Global Memory Only Best speed-up 13.25x

Experiment 2
Global Memory & Shared Memory with Coalesced Access

Speed-up by between 4.34x Coalesced, 5.6x shared

Experiment 3 Increasing Number of Images


See effect of increasing number of images Increasing number of images performance gains increases (8K images 89x speed-up)

CS2 Exp1 Using Global Memory

CS 2 Exp 2 Speed-Up Due Shared or Coalesced Memory

CS2 Exp3 Increased Number of Images

Key Insights
CUDA faster than OpenCL* Speed-up between 11.6x to 89x possible Larger image sizes have more speed-up I/O is significant and needs to be optimized Efficient GPU memory access is vital Using more kernels increase memory access and reduces performance

*Researchers didnt effectively investigate why

Reviewers Conclusions
Useful experiments Valid conclusions make sense Sometimes not clear what the baseline is Hardware is now dated

Didnt explore why CUDA differs from OpenCL

Q&A

You might also like