You are on page 1of 4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Support>KnowledgeBase>RunningCOMSOLinparallelonclusters

RunningCOMSOLinparallelonclusters
BrowsebyCategory

SolutionNumber:

1001

Title:

RunningCOMSOLinparallelonclusters

Platform:

Windows,Linux

Appliesto:

AllProducts

Versions:

Allversions

Categories:

Solver

Keywords:

solvermemoryparallelcluster

ProblemDescription
Thissolutiondescribeshowyouenabledistributedparallelization(clusterjobs)inCOMSOL.

Solution
COMSOLsupportstwomutualmodesofparalleloperation:sharedmemoryparalleloperationsanddistributedmemoryparalleloperations,includingcluster
support.Thissolutionisdedicatedtodistributedmemoryparalleloperations.Forsharedmemoryparalleloperations,seeSolution1096.
COMSOLcandistributecomputationsoncomputeclustersusingtheMPImodel.Onelargeproblemcanbedistributedacrossmanycomputenodes.Also,
parametricsweepscanbedistributedwithindividualparametercasesdistributedtoeachclusternode.
ClustercomputingissupportedonWindows(WindowsHPCServer2008/R2)andLinux,includingcommonschedulerslikeLSF,PBS,andSunGridEngine
(SGE,alsoknownasOracleGridEngine).Asofversion4.3,COMSOLbydefaultusesHydratoinitializetheMPIenvironmentonLinux.
NOTE:touseCOMSOLonacomputecluster,youneedtheFloatingNetworkLicense(FNL)option.
Atthebottomofthispagearequickguidesthatexplainhowtogetstartedwithclustercomputing,andhowtogetmoreinformation.
Someusefultipsandtroubleshootingguidesareprovidedbelow.

Fundamentals
Thefollowingtermsoccurfrequentlywhendescribingthehardwareforclustercomputingandsharedmemoryparallelcomputing:
Computenode:Thecomputenodesarewherethedistributedcomputingoccurs.TheCOMSOLserverresides
inacomputenodeandcommunicateswithothercomputenodesusingMPI(messagepassinginterface).
Host:Thehostisahardwarephysicalmachinewithanetworkadapteranduniquenetworkaddress.Thehost
ispartofthecluster.Itissometimesreferredtoasaphysicalnode.
Core:Thecoreisaprocessorcoreusedinsharedmemoryparallelismbyacomputationalnodewithmultiple
processors.

Thenumberofusedhostsandthenumberofcomputationalnodesareusuallythesame.Forsomespecialproblemtypes,likeverysmallproblemswithmany
parameters,itmightbebeneficialtousemorethanonecomputationalnodeononehost.

Clusterdistribution,WindowsandLinux
ExamplemodelsforclustertestingareincludedintheModelLibrary:
COMSOL_Multiphysics/Tutorial_Models/micromixer_cluster
COMSOL_Multiphysics/Tutorial_Models/thermal_actuator_jh_distributed

Troubleshooting
https://www.comsol.co.in/support/knowledgebase/1001/

1/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Yourfirststopistomakesureyouhavethelatestreleaseinstalled.Thelatestreleasecanbedownloadedhere.AlsodoHelp>CheckforUpdatestoinstallthe
latestsoftwareupdates.Thelatestupdatesarealsoavailablefordownloadhere.
ErrormessagesrelatingtoGTK
GLibGObjectWARNING**:invalid(NULL)pointerinstance
GLibGObjectCRITICAL**:g_signal_connect_data:assertion`G_TYPE_CHECK_INSTANCE(instance)'failed
GtkCRITICAL**:gtk_settings_get_for_screen:assertion`GDK_IS_SCREEN(screen)'failed
...

TheseerrorstypicallyoccurwhenCOMSOL'sJavacomponentistryingtodisplayanerrormessageinagraphicalwindow,butthereisnographicaldisplay
available.Therecommendedsolutionistodisablefilelocking.Addtherow
Dosgi.locking=none

tothreeofCOMSOL's*.iniconfigurationfiles.Openthefollowingfilesinatexteditor:
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolcluster.ini
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolclustermphserver.ini
/usr/local/comsol51/multiphysics/bin/glnxa64/comsolclusterbatch.ini

IneachofthesefilesyouwillfindseveralDosgi.*rows.AddtheDosgi.locking=nonerowdirectlybelowthese.Pleasenotethattheoptionsarecasesensitive.
Checkthatthenodescanaccessthelicensemanager
Linux:Logintoeachnodeandrunthethecommand
comsolbatchinputfile/usr/local/comsol50/multiphysics/models/COMSOL_Multiphysics/EquationBased_Models/point_source.mphoutputfileout.mph

Thecommandaboveshouldbeissuedononeline./usr/local/comsol50isassumedtobeyourCOMSOLinstallationdirectory.The
/usr/local/comsol50/multiphysics/bindirectory,wherethecomsolscriptislocated,isassumedtobeincludedinthesystemPATH.Makesureyouhavewrite
permissionsfor./out.mph.Noerrormessagesshouldbeproduced,oryoumayhavealicensemanagerconnectivityproblem.
WindowsHPCS:LogintoeachnodewithremotedesktopandstarttheCOMSOLDesktopGUI.Noerrormessagesshouldbedisplayed.
IssueswithInfinibandbasedLinuxclusters
UpdatetheInfinibanddriverstothelatestsoftwareversion.Ifyoucannotupdateatthistime,addthecommandlineoptionsmpifabricsshm:tcpormpifabrics
tcp.ThiswilluseTCPforcommunicationbetweennodes.
FormoreinformationadviceonhowtotroubleshootInfinibandissues,pleaserefertothesectionTroubleshootingDistributedCOMSOLandMPIintheCOMSOL
MultiphysicsReferenceManual.
ProblemswiththeClusterComputingfeatureinthemodeltree
Ifyougettheerrormessage"Processstatusindicatesthatprocessisrunning",itmeansthatthe*.statusfileinthebatchdirectoryindicatesthatthepreviousjobis
stillrunning.Insomecasesthiscanhappenevenifthejobisnotactuallyrunning,forexampleifthejobhaltedorwasterminatedinanuncontrolledway.Towork
aroundthisproblem,performthesesteps:
CancelanyrunningjobsintheWindowsHPCSJobmanagerorotherschedulerthatyouuse.
InCOMSOL,gototheExternalProcesspageatthebottomrightcorneroftheCOMSOLDesktop.
ClicktheClearStatusbutton.Iftheerrorstillremains,manuallydeleteallthefilesinthebatchdirectory.

ErrormessagesduetocommunicationproblemsbetweenLinuxnodes
Ifyougeterrormessages,makesurethatthecomputenodescanaccesseachotherovertcp/ipandthatallnodescanaccessthelicensemanagerinordertocheck
outlicenses.IfyourunthesshprotocolbetweenthehostsonaLinuxclusteryouneedtopregeneratethekeysinordertopreventthenodestoaskeachotherfor
passwordsassoonascommunicationisinitiated:
#generatethekeys
sshkeygentdsa
sshkeygentrsa
#copythepublickeytotheothermachine
sshcopyidi~/.ssh/idrsa.pubuser@hostname
sshcopyidi~/.ssh/iddsa.pubuser@hostname

Cloudcomputing
COMSOL4.3aintroducedsupportforcloudcomputingthroughAmazonElasticComputeCloud(AmazonEC2).SeethePDFguideRunningCOMSOLon
theAmazonCloudforfurtherinformation.
HardwareRecommendations
SeetheknowledgebasesolutiononSelectinghardwareforclusters.
https://www.comsol.co.in/support/knowledgebase/1001/

2/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

SeeAlso
SeealsoCOMSOLandMultithreading.
ExampleofLSFjobsubmissionscript
#!/bin/sh
#Rerunprocessifnodegoesdown,butnotifjobcrashes
#Cannotbeusedwithinteractivejobs.
#BSUBr
#Jobname
#BSUBJcomsoltest
#Numberofprocesses.
#BSUBn20
#Redirectscreenoutputtooutput.txt
#BSUBooutput.txt
rmrfoutput.txt
#CreatehostfileforCOMSOL
cat$LSB_DJOB_HOSTFILE|uniq>comsol_hostfile
#LaunchtheCOMSOLbatchjob
comsolclustersimplefcomsol_hostfilebatchinputfilein.mphoutputfileout.mph

ExampleofPBSjobsubmissionscript
#!/bin/bash
###############################################################################
#
exportnn=2
exportnp=8
exportinputfile="simpleParametricModel.mph"
exportoutputfile="outfile.mph"
#
qsubVlnodes=${nn}:ppn=${np}<<__EOF__
#
#PBSNCOMSOL
#PBSqdp48
#PBSo$HOME/cluster/job_COMSOL_$$.log
#PBSe$HOME/cluster/job_COMSOL_$$.err
#PBSrn
#PBSmaMemail@domain.com<br>
#
echo"
echo"Startingjobat:date"
echo
#
cd${PBS_O_WORKDIR}
echo"Currentworkingdirectoryis:pwd"
#
np=$(wcl<$PBS_NODEFILE)
echo"Runningon${np}processes(cores)onthefollowingnodes:"
cat$PBS_NODEFILE
#
cat$PBS_NODEFILE|uniq>comsol_nodefile
echo"parallelCOMSOLRUN"
comsolclustersimplefcomsol_nodefilebatchmpiargrmkmpiargpbsinputfile$inputfileoutputfile$outputfilebatchlogbatch_COMSOL__$$.log
echo
echo"Jobfinishedat:date"
echo"
#
__EOF__

RelatedFiles
cluster_install_linux_50_169.pdf
cluster_install_linux_50_169.pptx
cluster_install_win_50.pdf
cluster_install_win_50.pptx

1.3MB
1.2MB
1,007KB
1.3MB

Feedback
Documentquality?(poortoexcellent)
Howcanweimprovethisdocument?

https://www.comsol.co.in/support/knowledgebase/1001/

3/4

10/9/2016

RunningCOMSOLinparallelonclusters1001KnowledgeBase

Myemailaddress:(optional)

VerifyEmail

Send

Disclaimer
COMSOLmakeseveryreasonableefforttoverifytheinformationyouviewonthispage.Resourcesanddocumentsareprovidedforyourinformationonly,and
COMSOLmakesnoexplicitorimpliedclaimstotheirvalidity.COMSOLdoesnotassumeanylegalliabilityfortheaccuracyofthedatadisclosed.Any
trademarksreferencedinthisdocumentarethepropertyoftheirrespectiveowners.Consultyourproductmanualsforcompletetrademarkdetails.

Support
SupportCenter
KnowledgeBase
ProductUpdates
ProductDownload
ReleaseNotes
ContactSupport
ReleaseHistory
COMSOLBasedBooks

https://www.comsol.co.in/support/knowledgebase/1001/

4/4