You are on page 1of 16

Parallel Matlab

Laboratory for Computational Cell Biology


Overview

● The present situation in the lab


● What types of computation can easily be paral-
lelized?
● Mathworks Distributed Computing Toolbox
● MatlabMPI
The Present Situation

● Many algorithms we use are running into com-


putational limits on a single workstation. On av-
erage, processing a movie takes 2 to 4 hours of
processing time.
● Using dedicated 64-bit computing servers have
relieved things a little bit, but processing is still
done in a serial fashion.
● Many algorithm could potentially profit of doing
parallel processing.
Programs Suitable for
Parallelization

● For the kind of 'embarrasingly parallel' pro-


cessing jobs we are talking about here, programs
that process data that can easily be chopped up
into pieces are the easiest to port. Examples:
– Edge detection on separate frames of a movie
– Image segmentation
– Pure numerical functions that run in parallel with dif-
ferent data sets (e.g. Monte Carlo simulations)
● Execution time >> Communication time
Parallel Matlab Toolkits

● Matlab Distributed Computing Toolbox – a


Mathworks implementation of a job submission
engine

● MatlabMPI – an implementation based on the


open Message Passing Interface standard
Matlab Distributed Computing
Toolbox
● Components:
– Distributed Computing Engine (DCE):
● Jobmanager (1), Workers (many)
– DC Toolbox (client sessions)
– Shared file system
● Basic program flow:
– Find a jobmanager and create a jobmanager object
– Create a job
– Set file dependencies
– Chop the problem up in pieces and create tasks
– Submit the job
– Wait for and gather results
Example Program (add numbers)

% Find job manager


jm = findResource('jobmanager','name','myjobmanager1');

% Create job object


job = createJob(jm,'FileDependencies', ...
{'/public/disttoolbox/adding.m'});

% Create tasks
createTask(job, @adding, 1, {1,2});
createTask(job, @adding, 1, {3,4});
createTask(job, @adding, 1, {5,6});

submit(job);

% Get the results


results = getAllOutputArguments(job);

% Do something with the results.


for i = 1 : size(results,1)
disp(results{i});
end
Evaluating a Function

● If you just want to quickly evaluate a function


without going through the whole circus of setting
up jobs and tasks use 'dfeval'
results = dfeval (@sum, {[1 1] [2 2] [3 3]})
results =
2
4
6
MatlabMPI (1)

● Components:
– MatlabMPI toolbox
– Shared file system (can be $HOME/matlab)

● Execute a MatlabMPI program:


– machines = {'lccbws001' 'lccbws002' 'lccbws003'};
– MatMPI_Delete_all;
– eval(MPI_Run('mpi_program', nr_of_nodes,
machine_list));

● Example programs in:


– /usr/local/matlab/MatlabMPI
MatlabMPI (2)

● Basic program flow:


– Initialize MPI
– Create a communicator
– Get size and rank of the local node
– If rank = 0 // master node
– Gather data
– Send data to compute nodes
– Probe for and receive results
– else // compute node
– Receive data from master node
– Process data and calculate results
– Send results to master node
– End
– Finalize MPI
Example program (print hostname
of compute nodes)
% Initialize MPI.
MPI_Init;

% Create communicator object.


comm = MPI_COMM_WORLD;

% Get size and rank for the node that the program is running on
comm_size = MPI_Comm_size(comm);
my_rank = MPI_Comm_rank(comm);

% Create a unique tag id for this message


tag = 1;

if my_rank == 0 % master node


% Get all the strings
for k = 1:comm_size-1
message = MPI_Recv(k, tag, comm);

disp(message);
end
else % compute node
% Send string to rank 0 node
[status,result] = unix('hostname');
message = ['Message from node ' num2str(my_rank) ': ' result];
MPI_Send(0, tag, comm, message);
end

% Finalize Matlab MPI.


MPI_Finalize;
Advantages

● Matlab DC Toolbox
– Fully integrated in Matlab by Mathworks itself.
– Jobmanager will take care of the distribution of jobs on the cluster, no need
for the user to think about this.
– Fairly easy to use and program tasks although the user still has to take care
of cutting the data / computation up in pieces.
– Easy to quickly evaluate a function in a parallel fashion.
● MatlabMPI
– Free
– Source code is available, which means we can extend functionality ourselves
– Complies with open MPI standard, which means that Matlab code can be
easily ported to other languages (C, C++, Fortran, etc)
– Not necessary to start up separate workers on nodes before running a job:
machines can be used in an adhoc fashion and can be determined at the time
of execution.
Disadvantages (1)

● Matlab DC Toolbox
– Comes at a cost per node
– Workers have to be started on all compute nodes. If these crash (and that
happened quite a number of times during experiments) they have to be
restarted manually.
– Every file used in the program should be identified and stored in a
fileDependencies array. These files should be accessible from every node
(shared filesystem). This includes m-files as well since the remote matlab
sessions are not started under the username who started the matlab client
session.
– Proprietary technology which makes it hard to port code to another language.
Disadvantages (2)

● MatlabMPI
– Matlab/toolbox licenses needed per node that the computation is run on.
– The user has to supply a list of machine names on which the computation
will run.
– More complex to program than the Matlab Toolbox (at least that was the first
impression) because of the rank number scheme used.
– A shared file system between computation nodes is needed to store
communication messages. Since we already have many of these shares, this
is not a real concern.
– Before each run of MPI_Run the shared directory has to be cleaned up by
using a call to the function MatMPI_Delete_all. Again this is not something
difficult to do, but has to be thought of.
– One has to be aware of potential deadlocks when waiting to receive
messages. This can be done by using timeouts and the MPI_Probe call. In
standard MPI non-blocking receives are available, but these have not (yet)
been implemented in MatlabMPI.
Conclusion

● Our lab has a couple of algorithms that will be suitable for


parallelization (identify with the group).
● Both the DC Toolbox and MatlabMPI Toolbox will be
usable for experimentation.
● The MPI toolbox has the advantage that people can start
using it right away, since nothing has to be installed. Type
'help MatlabMPI' for a list of functions that you can use.
● The DC Toolbox has to be purchased and installed on all
machines first.
References

● http://www.mathworks.com/products/distribtb
● http://www.ll.mit.edu/MatlabMPI/
● http://www.mpi-forum.org/

You might also like