Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform for Lines

A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform for Lines

Ratings: (0)|Views: 224|Likes:
Journal of Computing, eISSN 2151-9617, http://www.JournalofComputing.org
Journal of Computing, eISSN 2151-9617, http://www.JournalofComputing.org

More info:

Published by: Journal of Computing on May 13, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform forLines
Ghaith Makey, Kanj Al-Shufi and Mustafa Sayem El-Daher
 — With the advent of modern GPGPUs which can be used efficiently in general purpose applications, the use ofmultithreaded cores and high memory bandwidths of CUDA-enabled GPGPUs in digital image processing and featuresextraction has been raised to a new level. This paper uses NVIDIA’s CUDA language to calculate polar Hough transform forlines which is an important method for image features extraction; a performance study of this implementation and a comparisonwith CPU sequential computations is included. This study has been processed on GPGPU which is inexpensive and availablefor all the research laboratories in the developing countries.
Index Terms
 — Image feature extraction, Parallel computing, Graphics processors, Performance Analysis
1 I
N order to meet demands from the 3D graphic indus-try, GPUs have developed and obtained distinctivepower in parallel computation and relatively hugememory bandwidth [1]. These properties have recentlyinspired the use of GPU as general purposes computingunit where many applications can benefit of theseGPGPU’s properties. Two of the important applicationsthat have driven into the implementation of GPGPUs areimage processing and feature extraction. In this contextthis paper has been written to study the efficiency of theuse of NVIDIA’s CUDA enabled GPGPU in calculatingpolar Hough transform for lines which is a key techniquein image processing and pattern recognition [2].
2 R
A large amount of interest was given lately to the use ofgeneral purpose computing abilities of CUDA-enabledGPUs and a large number of applications have been acce-lerated by modifying to be applicable on these GPUs.In 2008 Shuai Che et al introduced a performancestudy of general-purpose applications on graphics pro-cessors using CUDA where some of general applicationshave been applied on GPU and CPU and a performancestudy has been given [3].
CUDA is abbreviation for Computing Unified DeviceArchitecture and it is defined as a general purpose paral-lel computing architecture that leverages the parallelcompute engine in NVIDIA GPGPUs to solve many com-plex computational problems in more efficient way thanon a CPU [1].CUDA’s functions are easy to learn when strong back-ground in both C and GPGPU architecture is presented.For more information about CUDA one can refer to ref[1]. Many of tutorials and learning media and documenta-tions can be found in NVIDIA CUDA website or in refer-ence [4].
4 H
4.1 Overview
Hough Transform (Hough, 1962) is a key method of im-ages feature extraction by shape matching and it has beenused widely to extract the basic shapes of images. Thismethod has proven efficient but still need high computa-tional requirements and here comes the benefit of usingGPGPU power to fulfill these requirements [2].
4.1 Hough transform for lines
Hough transform for lines converts all the points of a lineinto a specific point in an accumulator space, let us saythat we have a line with this equation in the Cartesianspace:
cmx y
Ghaith Makey is PhD student at The Higher Institute of Laser Research And Applications, Damascus University.
K. Al-shufi, Associate professor at Damascus University and vice dean of The Higher Institute of Laser Research And Applications, Damascus,Syria.
 M. Sayem El-Daher Associate professor of computational physics at the physics department Damascus University, Damascus, Syria.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG86
Where m is the line slope and c is the intercept with y-axis. This equation can be rearranged as:
Bx Ay
 (2) Where:
c A
 / 1
cm B
.Since each of A and B defines a line in the Cartesianspace, each x and y coordinates for a point in a line definea line in in the parametric space of A and B. And all linesin the (A,B) parametric space which came from all thepoints of a line in the (x,y) Cartesian space should be in-tersected in the point which correspond to the slope andintercept of this line [2].However, practically its more convenient to use thepolar parameters (
) instead of the Cartesian parame-ters (
) because of many reasons like the infinity valueof m in case of vertical lines and the huge range of valueswhich c is may come within [2].In this case the line equation is written as:
    
y x
is the normal distance to the line from the originand
is the angle of this normal.
Fig. 1. Polar form of a line.
In this case we can guarantee that
value is between 0and 180 degrees and
value is between 0 and
is the width of input image width and
is theheight of input image.(a)(b)
Fig. 2. Polar Hough transform (b) for an image with two lines (a).
5 H
5.1 Hardware specifications
The platform is Windows 7 based. The CPU used is AMDPhenom™ 9850 Processor 2.51 GHz. The GPGPU used isNVIDIA GeForce 9400 GT which was available by theprice of less than 40$ at the date of this work. The Com-pute Capability of this GPGPU is 1.1 which representsalmost the minimum compute capability a GPU can pro-vide (but better than the GPUs with Compute Capability1.0 that it supports Integer atomic functions operating on32-bit words in global memory).All the baseline methods used on the CPU are sequential.
5.2 CPU based functions5.2.1 Matlab function
For calculating polar Hough Transform for lines we haveused the next function in Matlab:
Fig. 3. Matlab function to calculate Polar Hough Transform for lines.
Where: The 2D array
represents the input image.The 2D array
represents the result (all its elementsare set to zero before applying to the function). The integ-er variables (rows) and (columns) are the dimensions ofthe array
. And the variable
is calculating by(4). The version of Matlab in this work is 7.8.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG87
5.2.2 C function
Fig. 4. C function to calculate Polar Hough Transform for lines.
The input variables definitions for this function are likethus for Matlab code.
arrays are taken aspointers. For the compilation of this code we used Micro-soft Visual Studio 6.
5.3 GPU based kernel
Fig. 5. CUDA Kernel to calculate Polar Hough Transform for lines.
The input variables definitions for this kernel are like thusfor Matlab code.
arrays are taken aspointers in device memory stages of copying data be-tween the host memory and device memory are requiredbefore and after calling the kernel.The parallelism of threads has been used here on theparameter theta to insure that each thread is writing in aseparated location of
array. Thus this kernel onlyneeds 16 blocks each with 16 threads whatever the di-mensions of the input image are. When we used the par-allelism of threads on
parameters we had to useatomic functions to insure that there is no conflict in writ-ing to the device memory but however the serializationprovided by the atomic function when many threads aretrying to use the same memory address was so slow so inour GPU that no speedup has achieved.We couldn’t use the share memory because for eachthread a huge shared memory was required that ourGPGPU was not able to provide.The atomic function
has been used to incre-ment the points of the accumulator space because it fasterthan using the operator (++) for example (faster up to20%).
We have compared the performance of sequential CPU’sMatlab and C functions and parallel CUDA kernel for 10images with the dimensions of (256x256). The differencebetween the ten images is the dark pixels capacity in eachwhich represents the data in these images. These ten im-ages have been applied on the sequential functions andthe kernel and the time has been taken by using: Matlab7.8 Profiler for Matlab function, Visual Studio C++ 6.0Profiler for C function and CUDA Visual Profiler for CU-DA kernel. The speedups have been calculated and theresults charts have been drawn on Fig.3 .We can see that the speedup for an image with 60%dark pixels capacity is about 12 times for GPU vs. sequen-tial C for single CPU and better than 23 for GPU vs. se-quential Matlab for single CPU; these results were takenon relatively low parallelism ( just 180 threads) on almostthe cheapest CUDA-enabled GPU in the market.Dark pixels capacity of more than 3% is required forthe GPU to start giving speedup because when poor par-allelism load is given the strong sequential computationspeed of CPU will be dominating.(a)(b)
Fig. 6. The speedup for GPU over C for single CPU (a) and thespeedup for GPU over Matlab for single CPU (b).
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG88

Activity (5)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
Ghaith_Makey liked this
Ghaith_Makey liked this
Ghaith_Makey liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->