• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
 
COSC 6374
 
Parallel ComputationProject ITeam 07Neelima ParamatmuniPadmaja PentelaShyamali BalasubramaniyanShravanthi Denthumdas
 
Samira Hemrajani.
 
 
Problem Description:
The objective of our project is to parallelize a sequential code such that it can beexecuted for N number of processes. The original source code performs Image Analysis for animage which is treated as a matrix of characters. NOTE : 1 character is a pixel of black and whiteimage.The most computation intensive task of the code was FFT operations. These wereparallelized using the MPI based FFT routines.The major challenge of this task was:
Understand the sequential code and investigate parallelization approaches.
Understand data distribution requirements of the FFTW routines.
Synchronization of tasks to enforce ordering if and when required.We focused mainly on our solution to be scalable. This is because we would be required to testour parallelized code for bigger images which are impossible to be viewed on a single processor.
Parallelization Strategy:
The following steps unfold the various stages of our task parallelization strategy.
Setting up the ‘parallel environment’ and use of MPI FFTW routines:
 Firstly, we initialize the Parallel environment using MPI_Init (). Next, We create plans forthe forward and inverse transforms, of type fftwnd_mpi_plan, using function:
fftwnd_mpi_plan fftw2d_mpi_create_plan (MPI_Comm comm,int nx, int ny,fftw_direction dir, int flags); 
 
Distributing the problem/data:
 In order to obtain a load balanced form of data distribution for the MPI FFTW routines weuse the following function:
fftwnd_mpi_local_sizes ( fftplan, &local_height,&local_start,&local_ny_after_transpose,&local_y_start_after_transpose,&total_local_size); 
This allows us to divide the image horizontally such that each process gets x number of rows andy columns where x<1
st
dimension of image and y always equal to 2
nd
dimension of image.
Figure 1: Slab decomposition for data distribution
Parallelizing Reading the Image:
In accordance with our task parallelization strategy, every process will carry out its taskconcurrently. The image read operation is also parallelized in a similar fashion. The Image is read
Y dimension (always constant)X dimesion along which theimage is divided
 
based on the local_start and local_height namely the parameters returned by our data layoutfunction. Thereby, each process reads only its own subset of data.
Padding the image:
Padding of the image and the filter is a major part of the image analysis task.We need to introduce a ‘border’ around the original image due to symmetry requirements of theFFT operation.This is achieved by padding the image with zeros on the right and the bottom.
Figure 2: Image Padding
Since, each process gets a subset of the rows of the data, during parallelization the imagepadding will have to be carried in one of the following 3 ways:Case 1 : A process may have rows that require padding only on right side (refer to figure 2).Case 2 : A process may have rows that require padding on the bottom and well as the right handside.Case 3 : A process may have chunk of data which is only padding.
Padding the filter:
The filter used for the image analysis task needs to be blown up to the size (samedimensions) of the padded image. The padding of the filter is done in a similar manner as imagepadding i.e. appending zeros to the right and bottom.
FFT of image and filter:
The actual computation of the transform is performed by the function fftwnd_mpi.We replaced the fftwnd_one() in the original sequential version with corresponding parallelversions:fftwnd_mpi(fftplan,1, padimg, NULL, FFTW_NORMAL_ORDER )fftwnd_mpi(fftplan,1, padfilter, NULL, FFTW_NORMAL_ORDER)
Convolution:
FFT of the padded image and filter are performed first, then these are multiplied to getthe convoluted output. Since this task is now divided over multiple processes, we modify theconvolution loop to range from 0 to number of rows on each process (image height on eachprocess).
Inverse FFT:
This is the Inverse of the FFT operation. For purpose of parallelization we replace thefftwnd_one() with fftwnd_mpi(ifftplan,1, convolved, NULL, FFTW_NORMAL_ORDER ).
Depadding:
The depadding operation is carried out on the output of the inverse FFT operation. Itseparates the image from the padding that we had introduced during the FFT operation.
CASE 3CASE 1CASE 2
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...