Parallel Matlab: Laboratory For Computational Cell Biology

Parallel Matlab
Laboratory for Computational Cell Biology

Overview
● The present situation in the lab

● What types of computation can easily be paral-
lelized?
● Mathworks Distributed Computing Toolbox
● MatlabMPI
The Present Situation
● Many algorithms we use are running into com-

putational limits on a single workstation. On av-
erage, processing a movie takes 2 to 4 hours of
processing time.
● Using dedicated 64-bit computing servers have
relieved things a little bit, but processing is still
done in a serial fashion.
● Many algorithm could potentially profit of doing
parallel processing.
Programs Suitable for
Parallelization
● For the kind of 'embarrasingly parallel' pro-

cessing jobs we are talking about here, programs
that process data that can easily be chopped up
into pieces are the easiest to port. Examples:
– Edge detection on separate frames of a movie
– Image segmentation
– Pure numerical functions that run in parallel with dif-
ferent data sets (e.g. Monte Carlo simulations)
● Execution time >> Communication time
Parallel Matlab Toolkits
● Matlab Distributed Computing Toolbox – a

Mathworks implementation of a job submission
engine
● MatlabMPI – an implementation based on the

open Message Passing Interface standard
Matlab Distributed Computing
Toolbox
● Components:
– Distributed Computing Engine (DCE):
● Jobmanager (1), Workers (many)
– DC Toolbox (client sessions)
– Shared file system
● Basic program flow:
– Find a jobmanager and create a jobmanager object
– Create a job
– Set file dependencies
– Chop the problem up in pieces and create tasks
– Submit the job
– Wait for and gather results
Example Program (add numbers)
% Find job manager

jm = findResource('jobmanager','name','myjobmanager1');
% Create job object

job = createJob(jm,'FileDependencies', ...
{'/public/disttoolbox/adding.m'});
% Create tasks
createTask(job, @adding, 1, {1,2});
submit(job);
% Get the results

results = getAllOutputArguments(job);
% Do something with the results.

for i = 1 : size(results,1)
disp(results{i});
end
Evaluating a Function
● If you just want to quickly evaluate a function

without going through the whole circus of setting
up jobs and tasks use 'dfeval'
results = dfeval (@sum, {[1 1] [2 2] [3 3]})
results =
2
4
6
MatlabMPI (1)
● Components:
– MatlabMPI toolbox
– Shared file system (can be $HOME/matlab)
● Execute a MatlabMPI program:

– machines = {'lccbws001' 'lccbws002' 'lccbws003'};
– MatMPI_Delete_all;
– eval(MPI_Run('mpi_program', nr_of_nodes,
machine_list));
● Example programs in:

– /usr/local/matlab/MatlabMPI
MatlabMPI (2)
● Basic program flow:

– Initialize MPI
– Create a communicator
– Get size and rank of the local node
– If rank = 0 // master node
– Gather data
– Send data to compute nodes
– Probe for and receive results
– else // compute node
– Receive data from master node
– Process data and calculate results
– Send results to master node
– End
– Finalize MPI
Example program (print hostname
of compute nodes)
% Initialize MPI.
MPI_Init;
% Create communicator object.

comm = MPI_COMM_WORLD;
% Get size and rank for the node that the program is running on
comm_size = MPI_Comm_size(comm);
my_rank = MPI_Comm_rank(comm);
% Create a unique tag id for this message

tag = 1;
if my_rank == 0 % master node

% Get all the strings
for k = 1:comm_size-1
message = MPI_Recv(k, tag, comm);
disp(message);
end
else % compute node
% Send string to rank 0 node
[status,result] = unix('hostname');
message = ['Message from node ' num2str(my_rank) ': ' result];
MPI_Send(0, tag, comm, message);
end
% Finalize Matlab MPI.

MPI_Finalize;
Advantages
● Matlab DC Toolbox
– Fully integrated in Matlab by Mathworks itself.
– Jobmanager will take care of the distribution of jobs on the cluster, no need
for the user to think about this.
– Fairly easy to use and program tasks although the user still has to take care
of cutting the data / computation up in pieces.
– Easy to quickly evaluate a function in a parallel fashion.
● MatlabMPI
– Free
– Source code is available, which means we can extend functionality ourselves
– Complies with open MPI standard, which means that Matlab code can be
easily ported to other languages (C, C++, Fortran, etc)
– Not necessary to start up separate workers on nodes before running a job:
machines can be used in an adhoc fashion and can be determined at the time
of execution.
Disadvantages (1)
● Matlab DC Toolbox
– Comes at a cost per node
– Workers have to be started on all compute nodes. If these crash (and that
happened quite a number of times during experiments) they have to be
restarted manually.
– Every file used in the program should be identified and stored in a
fileDependencies array. These files should be accessible from every node
(shared filesystem). This includes m-files as well since the remote matlab
sessions are not started under the username who started the matlab client
session.
– Proprietary technology which makes it hard to port code to another language.
Disadvantages (2)
● MatlabMPI
– Matlab/toolbox licenses needed per node that the computation is run on.
– The user has to supply a list of machine names on which the computation
will run.
– More complex to program than the Matlab Toolbox (at least that was the first
impression) because of the rank number scheme used.
– A shared file system between computation nodes is needed to store
communication messages. Since we already have many of these shares, this
is not a real concern.
– Before each run of MPI_Run the shared directory has to be cleaned up by
using a call to the function MatMPI_Delete_all. Again this is not something
difficult to do, but has to be thought of.
– One has to be aware of potential deadlocks when waiting to receive
messages. This can be done by using timeouts and the MPI_Probe call. In
standard MPI non-blocking receives are available, but these have not (yet)
been implemented in MatlabMPI.
Conclusion
● Our lab has a couple of algorithms that will be suitable for

parallelization (identify with the group).
● Both the DC Toolbox and MatlabMPI Toolbox will be
usable for experimentation.
● The MPI toolbox has the advantage that people can start
using it right away, since nothing has to be installed. Type
'help MatlabMPI' for a list of functions that you can use.
● The DC Toolbox has to be purchased and installed on all
machines first.
References
● http://www.mathworks.com/products/distribtb
● http://www.ll.mit.edu/MatlabMPI/
● http://www.mpi-forum.org/
●

Parallel Matlab: Laboratory For Computational Cell Biology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Matlab: Laboratory For Computational Cell Biology

Uploaded by

Copyright:

Available Formats

Parallel Matlab

Laboratory for Computational Cell Biology

● The present situation in the lab

● Many algorithms we use are running into com-

● For the kind of 'embarrasingly parallel' pro-

● Matlab Distributed Computing Toolbox – a

● MatlabMPI – an implementation based on the

% Find job manager

% Create job object

% Get the results

% Do something with the results.

● If you just want to quickly evaluate a function

● Execute a MatlabMPI program:

● Example programs in:

● Basic program flow:

% Create communicator object.

% Create a unique tag id for this message

if my_rank == 0 % master node

% Finalize Matlab MPI.

● Our lab has a couple of algorithms that will be suitable for

You might also like