You are on page 1of 24

A ppl i cat i o n G u i d e L i n e s

Parallel Computing with MATLAB
on Amazon Elastic Compute Cloud

Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 1
Accelerating the pace of engineering and science

Contents

Introduction............................................................................................................3

Who Should Read this Paper........................................................................3

Is Cloud Computing Right for You?..........................................................................3

MATLAB Parallel Computing Tools: Basic Setup and Requirements.............................4

Using Parallel MATLAB on Amazon EC2..................................................................6

Setup..........................................................................................................6

Performing Parallel MATLAB Computations on Amazon EC2............................6

Managing Data...........................................................................................9

Setting up Parallel MATLAB on Amazon EC2.........................................................11

Setting up Your Desktop Computer...............................................................12

Setting up a Basic Compute Environment on Amazon Web Services...............13

Choosing an Amazon EC2 AMI and Instance...........................................13

Setting up MATLAB Distributed Computing Server.........................................14

Configuring an AMI with MATLAB Distributed Computing Server................14

Configuring MATLAB Distributed Computing Server Launch Mechanics........15

Setting up a Scheduler...............................................................................16

MathWorks Job Manager.......................................................................16

Third-Party Schedulers.............................................................................17

Network Setup..........................................................................................17

Network Setup for Using MathWorks Job Manager...................................17

Network Setup for Using Third-Party Schedulers........................................22

Setting up the MATLAB Client on a User’s Desktop........................................22

Setup for Using MathWorks Job Manager................................................23

Setup for Using Third-Party Schedulers......................................................23

Licensing..............................................................................................................24

License Management on Amazon EC2.........................................................24

The MathWorks Support Services..........................................................................24

References............................................................................................................24

2 Application Guideline

Using Parallel MATLAB on Amazon EC2 Is Cloud Computing Right for You? The cloud computing paradigm offers several advantages. Google App Engine). In a variety of industries and disciplines – such as science. This is particularly true for groups with research and development focus. These capabilities are offered as a high-availability service by a third party on a pay-as-you-go basis. and bioinformatics – the demand for massive compute resources for cluster applications can spike for relatively brief periods of time.g.. You can distinguish among the cloud computing services different vendors offer on the basis of the level of abstractions they provide: hardware-as-a-service or infrastructure cloud (e. However.Introduction Cloud computing has captured popular imagination with the immense possibili- ties this computing paradigm seems to offer. engineering. Amazon Elastic Compute Cloud (Amazon EC2) is one of the better known hardware-as-a-service cloud computing services. MATLAB Parallel Computing Products: Basic Setup and Requirements 3. This paper outlines key require- ments and steps for integrating MATLAB® parallel computing products – Parallel Computing Toolbox™ and MATLAB Distributed Computing Server™ – for use with the Amazon cloud computing service. Web-based e-mail services). our interest is in cloud computing services that offer hardware as a service. this computing paradigm lets orga- nizations break free from the cost-prohibitive task of maintaining their computing infrastructure. oil exploration.g. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 3 . 4.. platform (e. We highlight the additional requirements and setup steps in the sections that follow. Setting up Parallel MATLAB on Amazon EC2: Setting up the tools for use with a cloud computing service is similar to setting them up for use on a traditional computer cluster with a similar set of require- ments. Who Should Read this Paper System and cluster administrators should review the following sections: 1.g. Licensing 5. Location-agnostic availability is a key feature.. Is Cloud Computing Right for You? 2. and the end user interacts with them over the Internet cloud. In this paper. Amazon Elastic Compute Cloud). These organizations can either invest in continually adding compute resources (as well people resources to maintain these compute resources) or outsource this to a cloud computing service. More importantly. The term cloud computing itself is variously defined — it encompasses a broad range of IT capabilities that are acces- sible irrespective of the user’s geographic location. Is Cloud Computing Right for You? 2. finance. some advanced maneuvers are required. The MathWorks Support Services Users of MATLAB and Parallel Computing Toolbox should review the following sections: 1. and applications (e. MATLAB Parallel Computing Tools: Basic Setup and Requirements 3.

particularly in remotely installing and managing various soft- ware components. Cloud computing services typically charge by usage. Amazon provides an online tool to estimate monthly costs while using its services. Once this pool of workers is exhausted. For example. both in terms of your organization’s network security as well as intellec- tual property. You may also incur additional costs for software licenses. Firing up appropriate compute resources on a cloud comput- ing service can take a significant amount of time. upload- ing installation files. and performing basic tests on the Amazon EC2 cluster. on Amazon EC2. launching an instance configured with MATLAB Distributed Computing Server can take up to 10 minutes. an administrator with no experience with a cloud computing service can take approximately 6-8 hours to completely set up MATLAB parallel computing tools on Amazon EC2. any cloud computing service relies heavily on virtualization. users must wait until the workers are released by computations other users are running. Users’ ability to scale applications may be limited by the number of available software licenses. The fact that the operating environment is virtualized may alter some basic assumptions about how software will behave. Additionally. a MATLAB Distributed Computing Server license provides access to a certain number of workers. In our experience. This time includes reviewing Amazon documentation. storing data and machine images on Amazon storage service (S3). With a single button click. is almost instantaneous. Review these costs before signing up for any cloud computing service. Secondly. and Cost Setting up applications to run on a cloud computing service requires a significant time investment. compared to a compute cluster that dedicates specialized resources for these purposes. in addition to the Internet-induced latencies. Setup. users can commandeer hundreds of additional computational resources for their applications. Data transfer from client desktop computers to the virtual compute resources will also suffer from these bottlenecks. you must be aware of some fundamental issues before committing to this paradigm. This means that user applications that rely on interworker communication (using MPI for example) or have significant disk reads and writes may see performance deterioration. However. and any data transfers that you perform. the combination of the two forces described above means that the instantaneous horizontal scaling is only relative to the long hardware procurement and set up cycles. with cloud computing horizontal scaling. is another issue that needs careful thought. too. Security and Intellectual Property Security. Performance and Quality of Service You need to consider two aspects of performance and quality of service. Amazon’s service provides some freedom in the choice of these virtual compute resources to mitigate some of these effects. This means that system resources such as the disk I/O system and networking are shared among several virtual compute resources. With Amazon Web services (of which Amazon EC2 is a part) you will bear three costs: using Amazon EC2. This time increases as the size of user code and data that is hosted on the instance increases. or acquiring additional computer systems. Getting Started. First. A virtualized environment has implications for the performance of user applications. the interaction with a cloud computing service occurs over the Internet. 4 Application Guideline . Moreover. Data transfer over the Internet is inherently slow and may become a significant bottleneck for users’ applications. as we discuss in the next subsection. For example. particularly when multi- ple software components need to communicate with each other.

the interactive parallel computing capability in MATLAB parallel computing tools requires MATLAB workers to establish a direct connection with the client MATLAB session on a user’s desktop. More generally. you must also consider whether your organization will permit transferring data and user programs through the organization’s firewall to an outside network. MATLAB with Parallel Computing Toolbox: MATLAB and the toolbox provide users the language and tools for programming parallel MATLAB applica- tions. users may wish to host their large data sets on Amazon’s services to mitigate the cost of repeatedly transmitting the data over the Internet. You need to run as many server instances as the number of MATLAB workers your parallel computations require. For example. In addition to network security. as well as mechanisms to send applications for execution. Parallel Computing Toolbox and MATLAB are installed on the users’ desktops. 2. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 5 . it is essential to be able to establish an SSH connection with a running instance for someone to be able to customize it for users’ needs. MATLAB Distributed Computing Server consists of workers that per- form computations on the cluster computers. § Note that a user can employ MATLAB and Parallel Computing Toolbox in a desktop-only mode where MATLAB takes over the role of scheduler and spawns the four workers locally on the user’s desktop that are tied to the user and the MATLAB session. For example. The server and the scheduler are installed on the cluster. The fundamental setup of MATLAB parallel computing tools for use with a cluster requires four essential software components§: 1. For the Amazon EC2 service. MATLAB Parallel Computing Tools: Basic Setup and Requirements Figure 1: Basic setup of MATLAB parallel computing tools. Users can develop and test applications locally on their desktops in this mode before scaling up to clusters using MATLAB Distributed Computing Server. In addition. In a typical setup. accessing certain features of the MATLAB parallel computing tools requires configuring desktop sharing software (such as VNC) and a virtual private network (VPN). MATLAB and the toolbox are on a client that connects to a cluster to access compute services offered by the MATLAB Distributed Computing Server and with which the users interact directly.Interacting with a cloud computing service requires configuring your organiza- tion’s network security to allow necessary processes to communicate over the Internet.

once the job has been executed. License manager and all others: Depending on the setup. In a batch workflow. The MATLAB parallel computing tools enable both batch as well as interactive work- flows. The client MATLAB is installed on users’ desktops while the server is installed on the Amazon EC2 machine images. with which you can manage MATLAB jobs (only). and optionally a third-party scheduler. with client MATLAB on a user’s desktop. MathWorks job manager. However. MATLAB workers. MATLAB Distributed Computing Server requires you to configure a license manager for serving worker keys. depending on how you choose to configure the tools. you may need to configure one or more instances of license manager to manage licenses for MATLAB. possibly shut down MATLAB. and for receiving results from individual workers 3. The setup shown in Figure 2 mirrors a common cluster setup. In an interactive workflow. and retrieve results later. 3. Note that only certain types of MATLAB and toolbox licenses require a license manager. You can also use third-party schedulers if you have advanced cluster management and security needs. Client MATLAB and workers: Required for interactive sessions 4. a MATLAB user can submit a job to the cluster scheduler. Workers and scheduler: Required for the scheduler to send code and data received as jobs from the client MATLAB to the workers. 6 Application Guideline . Client MATLAB and the scheduler: Required for submitting jobs and receiving results. License Manager is for managing software licenses. we focus primarily on the MathWorks job manager. The following pairs should be able to communicate with each other for the tools to function: 1. a MATLAB user is connected directly with the MATLAB workers running on the cluster. In this paper. Scheduler manages the interaction between client computers and the clus- ter. 4. Figure 2: Client MATLAB and MATLAB Distributed Computing Server on two computer networks. initiating interactive sessions 2. The user sends commands that are executed immediately. Allowing such communication requires opening necessary firewall ports and install- ing additional software. MATLAB Distributed Computing Server comes with a basic scheduler. and results are available as soon as the command execution is complete. Users connect with the Amazon EC2 cluster the same way they would with a regular cluster from their desktop computers with some additional setup requirements.

 Open a command console. 2. Execute the command ipconfig /all from the command prompt. Using this IP address. Your system administrator may perform this setup manually by entering the appropriate network setup properties. you may receive a MAT-file containing Configuration properties. Once you have completed these two steps you are ready to perform MATLAB computations on your Amazon EC2 cluster. Connect to Amazon EC2: You must first establish a connection with the Amazon EC2 cluster using the VPN and/or SSH software your system admin- istrator has installed on your computer.internal’) The steps to determine this hostname on a Windows desktop are outlined below.) a. Configure the MATLAB session: Execute the pctconfig command from the MATLAB command prompt to set the client’s hostname that is assigned by the VPN server running on Amazon EC2. Enter cmd in the text box. execute the command nslookup ipaddress. VPN and/or SSH software with appropriate credentials: This is required to be able to connect with MATLAB Distributed Computing Server and send computations and retrieve results. Alternatively. Note that because MATLAB commu- Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 7 . 2. Copy the hostname listed for the Name field. d. b. c. Performing Parallel MATLAB Computations on Amazon EC2 The steps required for performing parallel MATLAB computations on Amazon EC2 are similar to those while using the local workers provided by Parallel Computing Toolbox or using a regular cluster that runs MATLAB Distributed Computing Server. ‘ip-10-8-0-4.ec2. (Request assistance from your system administrator. your system administrator needs to perform two setups on the desktop: 1. Your system administrator will provide instructions on how to do this. (In the Start menu.Using Parallel MATLAB on Amazon EC2 Setup For a MATLAB user to use a desktop installation of MATLAB to connect to a MATLAB Distributed Computing Server cluster running on Amazon EC2. This can be imported into your MATLAB installation using the Import Wizard available from the File menu in the Configurations manager. locate the adapter whose Description field has the word VPN. Note the IP Address property for this adaptor. click Run. A Parallel Computing Toolbox configuration for your MATLAB installation: The toolbox configurations are available from the Manage Configurations item in the Parallel menu in MATLAB (a Configurations manager GUI is launched). Use this hostname as an argument to the pctconfig command in MATLAB. However. Once the connection has been established. and hit enter or return key) This opens a separate window. you can start a MATLAB session. In the command window output. >> pctconfig(‘hostname’. you need to execute two steps before you can start using Amazon EC2 cluster for MATLAB computations: 1.

Some of the steps you can execute in MATLAB are shown below. 2.mathworks. 8 Application Guideline . Figure 3: Choosing the appropriate Parallel Computing Toolbox configuration for Amazon EC2. You can also query the cluster status using the findResource command. the response can be slow depending on the Internet traffic and the amount of data you transmit back and forth between your desktop and the Amazon services. refer to the Parallel Computing Toolbox documentation available with the product installation as well as online at www. your administrator might set up the configu- ration or it might be supplied to you as a MAT-file for importing into your MATLAB setup. As mentioned above. Figure 4: Using the findResource command to query cluster status. nicates with Amazon EC2 over the Internet. Select the EC2 Configuration as the default from the Parallel menu in MATLAB. For details.com: 1.

Figure 6: Using the batch command to send a MATLAB script for execution on Amazon EC2. The batch command automatically uses the previously selected configuration.Figure 5: An interactive parallel session. Note that response time may be slow depending on Internet traffic and amount of data exchanged between your desktop and Amazon EC2. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 9 . You can use the matlabpool command to initiate an interactive session for using parfor or any of the parallel routines in available in Optimization Toolbox™ and Genetic Algorithms and Direct Search Toolbox™. 3.

or createParallelJob for sending MATLAB scripts and func- tions for offline execution on the Amazon EC2 cluster. You can consider two options: 10 Application Guideline . are not visible to MATLAB workers running on the Amazon EC2 cluster. Using Parallel Computing Toolbox configurations: Parallel Computing Toolbox configurations let you set a FileDependencies property. As a result. Additionally. Managing Data Because your desktop and the Amazon EC2 cluster reside on two completely separate networks. Figure 7: The FileDependencies property lets you automati- cally transfer code and data to workers. Discuss your requirements with your system administrator if you have large data sets and would like to host them on Amazon services. createJob. both your code and your data files must be transferred from your desktop to the Amazon EC2 cluster for your computations. transmitting code and data every time you use the Amazon EC2 cluster can be very time consuming. For multiple projects. createMatlabpoolJob. 4. There are two alternatives: 1. and pass in the name of the configurations as needed. you can configure multiple copies of the Amazon EC2 Configuration that your administrator has supplied with different file and path dependencies. Using Amazon services to host data and code: For a large code base and large data sets. The file system on your desktop (and your organization) is not shared with the computers on Amazon EC2. or data files that you use within your organization. you can use functions such as batch. there are certain differences you need to bear in mind while managing data sets for your computations. 2. any MATLAB files you create on your desktop. The files pointed to by this property are bundled together and transported to the cluster at the beginning of an interactive session or a batch submission. Thus.

amazon. b. It is a simpler option. so a separate volume needs to be launched for each machine instance. it is faster to launch a virtual machine instance. users perform parallel MATLAB computations by sending computations to MATLAB Distributed Computing Server workers running on a cluster. The first approach is to install the server on the machine images directly. The second approach is to use the Amazon Elastic Block Storage (EBS) service in which you can configure a snapshot of an EBS volume to hold the server product installation. Note however that you are limited by the amount of storage allowed for the AMI instance that your administrator has signed up for. See Choosing an Amazon EC2 AMI and Instance below. Contact your system administrator for assistance. Because multiple copies of AMIs are launched when a cluster is set up on Amazon EC2. Set up the basic compute environment on the Amazon services. you can make the code and data visible when you perform computations. The server workers can be configured to be launched when instances of these machine images are launched. consider using the Amazon Simple Storage Service (S3) and Amazon Elastic Block Storage (EBS). EBS volumes cannot be shared between virtual machine instances. there are seven key steps in setting up MATLAB parallel computing tools for use with the Amazon cloud computing service. and then launch worker processes. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 11 . such as faster cluster launch on Amazon EC2. Setting up Parallel MATLAB on Amazon EC2 As described in the basic setup section above. With Amazon EC2 the cluster is a set of vir- tual machine instances (instances of Amazon Machine Images or AMIs. This process is then repeated for each worker launch. Similarly. 2. For more information on EBS visit http://aws. We describe these steps only for the first approach in the sections that follow. The second approach. However. and launching a cluster. launch an EBS volume from a previously configured snapshot. This option may particularly be useful if the stored code base and data are common among several users within your organization. Using the appropriate MATLAB path manipulation in your MATLAB workers. Retrieving data from Amazon S3 requires understanding the S3 API and requires programming in languages other than MATLAB. There are two ways to make an installation of MATLAB Distributed Computing Server available to these instances. 1.com/ebs The first approach requires fewer steps for configuring machine images. maintain- ing them. you need to copy your data only once. (Because machine images are smaller without the server product installation. introduces a few additional steps and is slightly more complex than the first. Set up the desktop computer through which you will connect with Amazon services. described later in this section). it can provide other efficiencies. Hosting on Amazon machine images (AMIs): This option is similar to keep- ing all your data and code on your desktop. Your administrator can simply copy the appropriate files and provide appropriate path settings through the PathDependencies property in a Parallel Computing Toolbox configura- tion (see Figure 7). using Amazon EBS requires advanced maneuvers.) For both of these approaches.a. Hosting using Amazon storage services: If you must store very large data sets. however. To launch a server worker requires that you to launch an instance of an AMI. connect the two together.

a browser plug-in for common Amazon EC2 related tasks. Set up client MATLAB with Parallel Computing Toolbox and other toolboxes on the MATLAB users’ desktop computers. such as VNC (and a corresponding server on your Amazon EC2 machine image). 5. For detailed instructions on set- ting up these tools for use with Amazon EC2. You can access graphics-based utilities and applica- tions installed on the Amazon EC2 machine image. This is available for download from the following URL: http://developer. 12 Application Guideline .or UNIX-based systems come preconfigured with an SSH client. In addition to having an SSH client. Set up your network. You will also need command line utilities that Amazon pro- vides (which require Java) for configuring the Amazon services. 6. Windows users can use Putty. Set up the scheduler.jspa?externalID=609 Figure 8: ElasticFox. Set up MATLAB Distributed Computing Server to run on the Amazon cluster. 3. see the Getting Started guide at Amazon EC2 Website. 7. We also recommend installing a Firefox browser plug-in (ElasticFox) that enables you to perform operations such as launching and closing systems by simple button clicks.com/connect/entry. we recommend setting up a desktop-sharing client on your machine.amazonwebservices. a free SSH client. Set up the license manager. Setting up Your Desktop Computer Some operations related to configuring and running systems on Amazon EC2 require an SSH connection from your desktop computer to the running system. 4. Most Linux.

you can choose from among the smaller instances. it will still function in 32-bit mode even when this AMI is launched as a 64-bit instance.mathworks. you can install and configure applications and services based on your requirements.com/connect/kbcategory.0 and above. Once MATLAB Distributed Computing Server is configured on a 32-bit AMI. MathWorks products are supported on the following Linux distributions: • 32-bit products: Red Hat® Enterprise Linux v. System Requirements for MATLAB Distributed Computing Server: Choose an instance that matches your disk space and RAM requirements. A 64-bit server will simply not work on a 32-bit instance. you must store your cus- tomized AMIs at the Amazon S3 service. or AMIs. you first need to launch an instance of the AMI.x.6. To configure an AMI.4. After configuring an AMI.0 and above • 64-bit products: Debian 4. The first. the Amazon S3 service does not require any special configuration steps. you must sign up for two Amazon services. MATLAB Distributed Computing Server requires approximately 3. on which you can run your applications. a server worker is a headless MATLAB process and consumes approximately a similar amount of system resources (1GB RAM is recommended for 32-bit MATLAB). Once you sign up for these services.com/support/sysreq/current_release/linux.x or 2. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 13 . review the documentation available on the Amazon Website: http://developer. Debian® 4. you can begin accessing and configuring them using the Amazon utilities you have previously downloaded and configured on your desktop computer. Simple Storage Service (S3). Amazon enables you to build your own AMIs from scratch by bundling and uploading a custom operating system installation from your own computers. Also. For details. The AMIs are in effect compressed encrypted copies of an entire operating system (Amazon EC2 currently supports Linux-based systems) including any application or service you have installed.4 and above. This service lets you configure and launch instances of Amazon Machine Images.4 and above.3. General System Requirements: Amazon AMIs are based on Linux.amazonwebservices. so you must meet the system requirements listed at the following URL: www.5GB disk space (equivalent to installing the entire suite of MathWorks products). lets you store your data as well as system images that you configure as a part of the Amazon EC2 service. Be careful when you choose a 32-bit or a 64-bit instance. For the purposes of configuring the AMIs. Fedora™ Core 4 and above. The instance you choose must satisfy two key requirements: 1. For our purposes. The Amazon EC2 service provides you with the computing infrastructure. and glibc (glibc6) 2. The Amazon AMIs are based on Linux®.html 2. A simpler way is to modify and extend template AMIs that Amazon provides. Amazon provides utilities to register this AMI so you can retrieve and launch saved AMIs later. OpenSuSE 1.jspa?categoryID=87 Choosing an Amazon EC2 AMI and Instance The first step in using Amazon EC2 is to create and configure an AMI.Setting up a Basic Compute Environment on Amazon Web Services To use the MATLAB parallel computing tools on Amazon EC2. Once you have a base AMI chosen or set up. Amazon distinguishes among instances on the basis of the size and memory requirements of the AMI.0 and above Other distributions not listed above must be built using Kernel 2.

this requires logging in to you MathWorks account and downloading the appropriate installers for MATLAB Distributed Computing Server. If you use this option. This will let you access the graphics-based utilities and applications on the AMI from your desktop computer. Figure 9: Installing MATLAB Distributed Computing Server on an Amazon EC2 AMI. 2. Once you have selected and launched an AMI instance using Amazon’s command- line utilities. 14 Application Guideline . We also recommend that once you have performed the basic configuration steps on the template AMI. We recommend that you install a desktop-sharing server (e. Download the installer directly to a running instance using a Web browser. Setting up MATLAB Distributed Computing Server Configuring an AMI with MATLAB Distributed Computing Server Once you have created a basic AMI and have a running instance. you can launch the MATLAB installer and follow the installation process. There are two ways to achieve this: 1. you must use the Amazon com- mand line utilities to bundle and upload the installation files to the base AMI. Bundle and upload the installer from your desktop. you save a copy as a base-AMI before further customizing it with MATLAB and other software.. Again. you can connect to it using an SSH client and begin configuring it for use.g. You can copy required files from the MathWorks installation DVD or download the files from your MathWorks account. The MATLAB Distributed Computing Server installation process requires the availability of the MATLAB Distributed Computing Server installation package on the running AMI instance. it is time to con- figure it with MATLAB Distributed Computing Server. Once the installer is available on the instance. VNC) on the AMI.

g. There are three ways to launch MATLAB Distributed Computing Server workers using the Worker-AMI: 1. you can pass in arbitrary text to an instance that can later be read during the instance boot up scripts. There are a number of possible ways to do this: a. Amazon EC2 lets you pass in arbitrary data to an instance on startup. you can call any UNIX® command including the commands used for starting server workers. we use the designation Worker-AMI when referring to both a saved AMI configured with MATLAB Distributed Computing Server as well as a running instance of this AMI. XML) and have some tool read this XML on instance startup. In this shell script. b. Manually log in to a running Worker-AMI instance and launch the MATLAB Distributed Computing Server worker processes using the appropriate launch scripts that come with the MATLAB Distributed Computing Server installation. Figure 10: Launching an instance of an Amazon EC2 AMI using ElasticFox. For the remainder of this document. you can launch several instances of the AMI to serve as work- ers for your parallel MATLAB computations. Configuring MATLAB Distributed Computing Server Launch Mechanics When using the MathWorks job manager all server workers must register with the job manager before computations are sent to them. 2.We recommend that you save this instance as a separate AMI. You can pass in a shell script and eval the shell script at instance startup.. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 15 . You can pass in structured data (e. using the ElasticFox Firefox browser plug-in. The registration happens automati- cally as a part of worker launch process. For example. it’s possible to configure your MDCS setup through instance data. Therefore. Once you have configured and saved this AMI. This means that you must first launch an AMI instance that fires up the MathWorks job manager processes and supply the host URL/ IP address of this instance to the worker launch processes.

No special settings are required for launching server workers on Worker-AMI instances when using third-party schedulers. However. It can manage only MATLAB jobs and it processes them in first-in.com/products/distriben/requirements. the server workers must know the location of the job manager host to register themselves with the job manager before they start receiving compu- tations. the server workers are launched as any other applica- tion at the beginning of a job and are shut down when the job is complete. This means that the typical launch sequence of the cluster begins with the launch of the Scheduler-AMI (which starts up the job manager) followed by Worker-AMI launches. Installing the third-party schedul- ers is similar to the process of installing MATLAB Distributed Computing Server. As we noted before. Because AMI instances are virtualized. Once these instances are brought up and are connected. You can hard-code the startup of the server workers using the traditional Linux boot script techniques as described in the server documentation. 3. Setting up a Scheduler MathWorks Job Manager The MathWorks job manager is a simple scheduler that ships with MATLAB Distributed Computing Server. both the Worker-AMIs and the server workers must remain alive. Thus. As a workaround you can remove the etc/hostname script on Worker-AMI. The direct integration of schedulers such as Platform LSF®. The job manager operates in a SOA fashion. the scheduler can launch and shut down server workers as any other application. There is no way to parameterize.html Third-Party Schedulers Third-party schedulers provide additional scheduling and security features com- pared to the MathWorks job manager. It requires the server workers to already be running before users submit computations to it. calling the hostname command returns the hostname of the virtualized server and not the underlying actual computer. The job manager is available with the installation of MATLAB Distributed Computing Server. and TORQUE with MATLAB parallel com- 16 Application Guideline . The server workers are launched on already running Worker-AMI instances. this implies that the startup sequence is hard-coded to an Amazon image. Each scheduler has its own mechanisms to maintain the list of cluster nodes and to designate a head node and worker nodes. This means that you can redesignate a Worker-AMI as the Scheduler-AMI and modify the appropriate launch scripts to start the job manager processes in place of the server worker processes. and that the hostname with which a computer identifies itself be the same as the hostname with which a computer is visible to others. Review the network requirements for using the server with MathWorks job manager: www. PBS Pro®. Microsoft Windows® Compute Cluster Server. Moreover. A key requirement when using the server with the job manager is that the worker processes be able to identify each other by hostname. These schedulers also provide the option of managing additional applications on your cluster. which means all instances launched from this image will do exactly the same startup steps. first-out (FIFO) order.mathworks. the server workers remain alive between jobs unless they fail or are explicitly shutdown. With third-party schedulers.

We successfully experimented with OpenVPN. This setup is required for workers to establish interactive connections with the client MATLAB running on users’ desktops as well as to establish the license manager for checking out worker keys. users’ desktops and the license manager) and Scheduler-AMI (which runs the job manager) to enable users to find the job manager and to submit jobs and retrieve results from the job manager. 6) Set up the job manager launch configuration file. additional network setup is required on both the desktop client and the Amazon instances. Network Setup Because the desktop computer. Third-party schedulers will likely be able to use the same setup. the scheduler. 7) Set up the worker launch configuration file. the three should be able to commu- nicate with each other for job submission (desktop and scheduler. results retrieval (desktop and scheduler. scheduler and workers). This section describes setting up a cluster managed by the MathWorks job man- ager. As a result. Customized scripts must reside on both the MATLAB users’ desktop computers as well as the Worker-AMI. 3) Set up the license server. you must use the API for the generic scheduler interface to customize job submission and retrieval mechanisms for all third-party schedulers. The two VPNs must be configured to push out their virtual networks to each other. This assumption is not valid when client MATLAB and MATLAB Distributed Computing Server workers are on completely separate networks. review Using Generic Scheduler Interface in the Programming Distributed Jobs and Programming Parallel Jobs sections in the Parallel Computing Toolbox documentation. 4) Set up the second VPN server on the Scheduler-AMI. and the server workers reside on dif- ferent networks. an open source VPN program. 2) Set up the VPN clients for this server on user’s desktops and the license server. As noted before. For details on implementing these.puting tools assumes the availability of a shared file system between the client computer and the cluster computers. we must establish two VPNs.. There are eight steps required to establish communication between the software running in your organization and on Amazon EC2: 1) Set up the first VPN server on Scheduler-AMI. scheduler and workers) and interactive parallel sessions (desktop and workers). Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 17 . A second VPN is required between the Worker-AMIs (which run the MATLAB Distributed Computing Server workers) and the Scheduler-AMI. 5) Set up the VPN client for this server on Worker-AMIs. Detailed examples are available with the MATLAB Distributed Computing Server installation. Network Setup for Using MathWorks Job Manager Using virtual private networks (VPN) is one of the several ways to set up the com- munication between different software components. In fact. The first VPN is required between the clients within your organization (i.e. 8) Configure the routing table on Scheduler-AMI. but we encourage you to contact the scheduler vendors for more information.

0 ifconfig-pool-persist externalipp.crt key /etc/openvpn/keys/externalserver.8. (Optional) Base IP (or subnet) (default 10. Once you have set up the configuration file you can start the VPN server and pass the location of the configuration file you created in step 3 as an input. Create a configuration file with entries for: a.255. Server protocol (set as TCP) d. Locations for the certificates and the key obtained in step 1 18 Application Guideline .0 255. and private key (key) as well as a client certificate and a private key.8.crt cert /etc/openvpn/keys/externalserver. 1) Setting up the first VPN server on Scheduler-AMI: 1. 2. A configuration file may have entries as follows: port 1194 proto tcp dev tap ca /etc/openvpn/keys/ca. Note that you will need to add a push entry to the configuration file for the second VPN subnet for using interactive capabilities.0” duplicate-cn keepalive 10 120 comp-lzo persist-key persist-tun status external-openvpn-status. (Optional) Port on which you want the VPN server to listen on (default 1194). Device type (set as TAP) c.0 255.0) e. See steps below. 2) Setting up the VPN client (first VPN server) on a user’s desktop and license server: 1. 2. You will have to open this port on your firewall. Create a configuration file with entries for: a.0. 5.key #This file should be kept secret dh /etc/openvpn/keys/dh1024.255. Locations for the certificates and keys generated in step 1 b. server certificate (cert). 3.log verb 3 4. Generate the Diffie Hellman parameters. Create an SSL root certificate (ca).0.255. client certificate (cert). and the client key (key) that you created in step 1 above to the user’s desktop.txt push “route 10.255.0.9.pem server 10. Copy the root certificate (ca).

key proto tcp remote ec2-75-101-227-140. Location of the server c. Double- quotes are necessary if there are spaces in the path name.compute-1.com 1194 resolv-retry infinite nobind persist-key persist-tun ns-cert-type server comp-lzo verb 3 3. Indicate that the VPN client must keep trying indefinitely to resolve the hostname of VPN server Entries in the configuration file may resemble the following example (on a Windows client). Device type and server protocol same as VPN server d. client dev tap ca C:\\MyCertificates\\ca. b.crt cert C:\\MyCertificates\\client. Note the double back slash (‘\\’) in the path names. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 19 . you can start the VPN client and pass the location of the configuration file you created in step 2 as an input.amazonaws.crt key C:\\MyCertificates\\client. Figure 11: Network setup requires establishing two virtual private networks (VPNs). Once you have set up the configuration file.

Same entries for certificate and key locations as well as device and protocol types for the first VPN server b. 4) Setting up a VPN server on Scheduler-AMI: 1.pem ifconfig-pool-persist internalipp.crt cert /etc/openvpn/keys/server.key # This file should be kept secret dh /etc/openvpn/keys/dh1024.0.txt client-to-client duplicate-cn keepalive 10 120 comp-lzo persist-key persist-tun status internal-openvpn-status.ip_forward=1 to /etc/ sysctl. 1195). 10..0 255.crt key /etc/openvpn/keys/server. Set up a VPN client on the license server as described in (2) above.. A ‘push’ entry for the first VPN server’s subnet A configuration file may have entries as follows: server 10.9. This should be different from the one you specified for the first VPN server (e.8. Enable IP forwarding on the Scheduler-AMI (echo 1 > /proc/sys/net/ ipv4/ip_forward).0 port 1195 proto tcp dev tap push “route 10. d.255. 3) Setting up the license server: 1. Start the VPN server by specifying the configuration file you created in step 2 as the input. Restart the first VPN server if it is already running.9.255.conf and run the command sysctl –p). 4. 2. 20 Application Guideline .255.0) c. Specifying the port on which you want the VPN server to listen on. 2.0. Enable IP forwarding so that the incoming networking traffic from virtual net- working interface can go to the real networking interface (echo 1 > /proc/ sys/net/ipv4/ip_forward or add net.255. Add a push entry for this VPN server’s subnet to the first VPN server’s configu- ration file.log verb 3 3.0” ca /etc/openvpn/keys/ca.ipv4.g.g. Create a configuration file with entries for: a.0.0 255. Specifying a base IP (subnet) that is different from the first VPN server (e.

31. A configuration file may have entries as follows: client dev tap proto udp remote domU-12-31-38-00-25-71.255. Copy the client certificates and key that you created for the desktop client above on to the Worker-AMI.5) Setting up the VPN Client (second VPN server) on Worker-AMI: 1. You can simply use the IP address (e.197 to the virtual IP 10. You will need to create a separate script to automate this because the VPN-IP address for each instance will be different.g. In the MDCE_DEF. Add a ‘push’ rule that redirects traffic for the license server to the Scheduler- AMI.31.10 Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 21 .1) of the second VPN server for the two variables because the Scheduler-AMI will always serve as the VPN server.197). Add a rule to the routing table on Scheduler-AMI to forward the traffic directed to 172.0.255” #172.sh file on the Worker-AMI. 7) Setting up the MATLAB worker launch configuration file: 1.8.255. 2.197 netmask 255. In the MDCE_DEF. 10. you must set HOST_NAME to the Worker-AMI instance’s VPN IP address. which can then route it to the license server..9.10) as well as the real IP address of the license server (e..0).255.10 using the following com- mand: route add -net 172.g.key ns-cert-type server comp-lzo verb 3 6) Setting up the job manager launch configuration file: 1.compute-1.45. 3.197 255.45.0.0.31.45. Create a configuration file similar to the configuration file you created for the desktop client above with the address and port for the second VPN server. 172.internal 1195 push “route 172.8.0.sh file. 10.crt cert /etc/openvpn/keys/client.crt key /etc/openvpn/keys/client.255 dev tap0 gw 10.8. set the variables JOB_MANAGER_HOST and HOST_ NAME to the VPN IP address of the Scheduler-AMI.45.197 -> license srvr IP resolv-retry infinite nobind persist-key persist-tun ca /etc/openvpn/keys/ca.45.31. 8) Configure the routing table on the Scheduler-AMI: 1.g.31.. Note that you need to use the IP address of the second VPN server since the Worker-AMIs con- nect to the job manager AMI over the second VPN (subnet 10. 2.255. Determine the IP address assigned to the license server by the first VPN server (e.0.9.

users may see performance deterioration because of virtualized compute resources and because the interaction with Amazon EC2 happens over the Internet. a MATLAB user’s interaction with an Amazon EC2 cluster mirrors (with minor changes) their typical interaction with a regular cluster. users must be aware of security and intellectual property concerns with transmitting data over the Internet and hosting it on an external service. you must repeat the steps above for each Scheduler-AMI restart. you will need to set up Parallel Computing Toolbox configurations to enable your users to use Amazon EC2 cluster. Your choice of a scheduler will decide how you set up these configurations. Encourage your users to review the concerns described earlier in this document. As a result. switching between the schedulers and clusters requires changing only the configuration name. as described above. Consult the scheduler vendors to discover appropriate mechanisms for enabling a MATLAB client to connect to the scheduler and the MATLAB workers to connect back to the MATLAB client. 22 Application Guideline . Setting up the MATLAB Client on a User’s Desktop As we noted before. As a system administrator. You can export these configurations in the form of MAT-files. Network Setup for Using Third-Party Schedulers A similar setup. For your users. the license manager will be assigned a new IP address. Users can make multiple copies of these configurations and customize them for their project- specific settings. which can be distributed to several users who can then import them into their MATLAB sessions. Detailed information on Parallel Computing Toolbox configurations is available (see reference 8). Additionally. Figure 12: Parallel Computing Toolbox configurations. For example. may work for third-party schedulers. 3. Every time you restart the Scheduler-AMI. provided the schedulers can be configured to run in a VPN environment.

You must supply the internal hostname as assigned by the VPN server (e. Your current MATLAB Distributed Computing Server licenses can be used on cloud computing services. In particular. that you must consider. There are certain usage restrictions.0.g. and to receive results (see the Network Setup section). review the “MATLAB Distributed Computing Server” section in the SLA. To achieve this.ec2. However. Therefore. which in turn will cause user-submitted batch-jobs and interactive sessions to fail. Customized scripts must reside on both the MATLAB users’ desktop computers as well as the Worker-AMI. Because an AMI instance can be launched anywhere on the Amazon network. ip-10-9-0-1.Setup for Using MathWorks Job Manager The Parallel Computing Toolbox configuration for the MathWorks job manager requires that you supply the hostname of the Scheduler-AMI that runs a copy of the job manager. This change will cause the license checkout process to fail. configure and launch a VPN client as described. the license manager needs to bind to a specific local IP or MAC address to let MATLAB Distributed Computing Server workers find the manager and checkout the appropriate number of keys. License Management on Amazon EC2 You can use your organization’s current license management infrastructure to serve MATLAB Distributed Computing Server licenses for workers running on Amazon EC2. In addition. One of the possible solutions is outlined in the Network Setup section above. to submit jobs. which needs the internal address of the physical machine on which it runs. The advantage of this setup is that you do not need to obtain separate licenses or manage multiple license servers. to submit jobs. and to receive results. however. the Worker-AMIs must be able to communicate with the license server that resides within your organization’s network.1 address) as the job manager hostname (LookupURL field in the configuration properties).9. the IP or MAC address will always change with each launch. Contact your MathWorks sales representative to discuss your requirements. you must install and configure any software (such as VPN or SSH) that you require to enable MATLAB users to connect to the Scheduler-AMI.. Note that the “static IP addresses” offered by Amazon services is an external IP address and therefore cannot be used by the license manager. you will need to set up custom scripts for users to be able to connect to the Amazon EC2 cluster. the key requirement of shared file system between users’ desktop computers and cluster computers for the direct integration of various third-party schedulers is not met when the users’ computers and the cluster computers reside on completely different networks. you must install any software (such as VPN or SSH) that you require for MATLAB users to connect to the Scheduler-AMI. In addition. If you are using the setup described in the Network Setup section. review the sections Using Generic Scheduler Interface in Programming Distributed Jobs and Programming Parallel Jobs sections in Parallel Computing Toolbox documentation. Parallel Computing with MATLAB on Amazon Elastic Compute Cloud 23 . Setup for Using Third-Party Schedulers As noted above. It is possible to establish a separate License-AMI on the Amazon EC2 for running the Macrovision FLEXlm® license management software (which serves MathWorks licenses). Licensing We strongly recommend reviewing The MathWorks Software License Agreement (SLA) and consulting The MathWorks regarding license usage prior to setting up MATLAB and parallel computing products on the Amazon EC2 service. For details on implementing these.internal for a server that is assigned the 10.

I nstallation Guide: MATLAB Distributed Computing Server: www.mathworks.com/AWSEC2/2008-05-05/GettingStartedGuide 5. you must manually re-designate a new instance for managing the licenses. after which you will need to contact The MathWorks for each redesignation. However. Inc. Therefore. Consult your sales repre- sentative or contact the Consulting Group directly: www. The MathWorks 6. The MathWorks Consulting Group is available to support these activities. See www. 24 Application Guideline .amazon. Note that number of license server redesignations is limited to four per year.com/services/consulting References 1. Amazon Elastic Block Storage: http://aws.php/documentation/howto. Other product or brand names may be trademarks or registered trademarks of their respective holders. Documentation.com/ec2 2. Amazon Elastic Compute Cloud: Getting Started: http://docs.com/ebs 4.amazon.mathworks.mathworks. your Amazon EC2 cluster will remain unusable until the redesignation process is complete.com/trademarks for a list of additional trademarks.html © 2008 The MathWorks. The MathWorks 7. The MathWorks Support Services Setting up MATLAB parallel computing products. you will need to keep the License-AMI running all the time. Inc. Parallel Computing Toolbox User Guide. OpenVPN: www.amazon.net/index. User Guide: Parallel Computing Toolbox.amazonwebservices.com/s3 3. P  rogramming with User Configurations. Amazon Elastic Compute Cloud: http://aws.openvpn. Amazon Simple Storage Service: http://aws. The MathWorks 9. User Guide: MATLAB Distributed Computing Server. tools. If for some reason the License-AMI fails or is shutdown. and services to run on Amazon EC2 requires advanced installation maneuvers. com/distconfig 8. MATLAB and Simulink are registered trademarks of The MathWorks.