You are on page 1of 10

Available online at www.sciencedirect.


Procedia Computer Science 18 (2013) 10 – 19

2013 International Conference on Computational Science

High performance computing in biomedical applications
S. Bastrakova, I. Meyerova,*, V. Gergela, A. Gonoskovb, A. Gorshkova,b, E. Efimenkob, M. Ivanchenkoa, M. Kirillinb, A. Malovaa, G. Osipova, V. Petrova, I. Surmina, A. Vildemanova
b a Lobachevsky State University of Nizhni Novgorod, 23 Prospekt Gagarina, 603950, Nizhni Novgorod, Russia Institute of Applied Physics of the Russian Academy of Sciences, 46 Ul'yanov Street , 603950, Nizhny Novgorod , Russia

Abstract This paper presents the results of the development of a biomedical software system at the Nizhni Novgorod State University High Performance Computing Competence Center. We consider four main fields: plasma simulation, heart activity simulation, brain sensing simulation, molecular dynamics simulation. The software system is aimed at large-scale simulation on cluster systems with high efficiency and scalability. We demonstrate current results of the numerical simulation, analyze performance and propose the ways to improve efficiency. 3 The Authors. Published by Elsevier B.V. © 2013 Selection and peer review under responsibility of the organizers of the International Conference on Computational and/or peer-review under responsibility of the organizers of 2013 the 2013 International Conference on Computational Science Science
Keywords: scientific computing; large-scale problems; highly scalable software; software for heterogeneous clusters; GPU computing; plasma simulation; heart activity simulation; brain sensing simulation; molecular dynamics simulation

1. Introduction One of the research areas of the Nizhni Novgorod State University High Performance Computing Competence Center [1] is the development of a unified software infrastructure for solving applied biomedical tasks on modern high performance systems. Biomedicine is one of the fast growing fields in terms of utilizing high performance computing. A key limiting factor for HPC in this field is the lack of software solutions for modeling real-world problems using new approaches in biophysics, pharmaceutics and physics of medical instrument design. The main goal of the project is to establish collaboration between life sciences researchers and software engineers with HPC background. This allows elaboration of the common methodology of data processing (e.g.

* Corresponding author. Tel.: +7-831-462-33-56; fax: +7-831-462-30-85. E-mail address:

1877-0509 © 2013 The Authors. Published by Elsevier B.V. Selection and peer review under responsibility of the organizers of the 2013 International Conference on Computational Science doi:10.1016/j.procs.2013.05.164

The Particle-in-Cell method is based on a self-consistent mathematical model representing plasma as an ensemble of negatively charged electrons and positively charged ions. Bastrakov et al. creation of shortlived isotope factories for bioimaging. PICADOR [3] is a tool for large-scale three-dimensional Particle-in-Cell plasma simulation on heterogeneous cluster systems. and development of tools for research on intramolecular and intraatomic processes. and molecular dynamics. thus closing the loop. Each time step comprises four stages: field update. Plasma simulation Simulation of plasma dynamics and interaction of high intensity laser pulses with various targets in particular is one of the currently high-demand areas of computational physics. and current deposition. The entire particle ensemble is represented by a smaller amount of so-called super-particles that have the same charge-to-mass ratio as the real particles. for each particle only values in eight nearest grid nodes are used. 1 (c) and (d)). 2. The Particle-in-Cell method operates on two sets of data: each particle of a particular type is characterized by its position and velocity (and the type determines the mass and charge). Common trends are discussed in conclusions section. Each section describes the particular problem together with the solution method and provides experimental justification. To analyze the accuracy of the implementation we performed simulation of laser wakefield acceleration with the parameters given in [5] using 1024 CPU cores. field interpolation. particle push. Solving some practical problems requires simulation of 10 9 and more particles and 108 grid cells thus demanding hundreds or thousands of nodes of a cluster system. 1. The resulting plots of the electric field in direction of laser polarization (left) and in the direction of laser propagation (right) are depicted in Fig. We use the finite-difference time-domain method [4] to update the grid values of the electric and magnetic field . the data referring to the currents. fields and particles are exchanged. PICADOR runs on cluster systems with one or several CPUs and GPUs per node using MPI for data exchanges and OpenMP / OpenCL in each process. respectively. The plots are done with MathGL library [6] for scientific data visualization. heart activity. All MPI exchanges occur only between processes handling neighboring domains.S. brain sensing. plasma dynamics research is often based on simulation with the Particle-in-Cell (PIC) method [2]. During the particle push stage the field values are interpolated to compute the corresponding force. field and current density values are defined on a uniform space grid. Due to the high degree of nonlinearity and geometric complexity of the problem. It has various biomedical applications: creation of compact sources for hadron therapy for treating oncological diseases. which results in equivalent plasma dynamics. The problem is decomposed according to a territorial principle: each MPI process handles a subarea keeping track of the field values and particles located in the corresponding part of the simulation area. The final current density values are determined by summarizing contributions of all particles and are used during the next field update. Boris The current deposition stage implies computing the method is used current density based on the positions and velocities of the particles. . The present paper includes four sections related to research in simulation of plasma. Implementation details are given in [3]. As long as more and more supercomputers share performance among CPUs and GPUs. / Procedia Computer Science 18 (2013) 10 – 19 11 visualization) and shared usage of mathematical kernels implementations. thus resulting in rapid development of the software capable of utilizing modern high performance systems. there is a growing interest in high-performance implementation of the Particle-in-Cell algorithm for heterogeneous systems along with traditional cluster systems. It is aimed at research of high intensity laser applications including laser-driven particle acceleration and generation of radiation with tailored properties. The method implies numerical solving the equations of particle motion together with the Maxwell's equations on a discrete grid. Our results are in good agreement with the corresponding results of several widely used Particle-in-Cell codes compared in [5] (Fig. each particle contributes to the current density values in eight nearest grid nodes. To keep all domain information relevant.

1. 2 presents comparison of computational time for uniform decomposition (left) and rectilinear decomposition (right).5 times. Left: data for uniform decomposition.12 S. methods of diagnostics. We use static rectilinear partitioning scheme [7] with sum of particles number and grid nodes number as a workload estimation. Right: electric field in the direction of laser propagation. usage of rectilinear decomposition improves efficiency by about 2. As an example of simulation with nonuniform particle density we consider simulation of laser compression using 512 x 512 x 512 grid. showing distribution of computational time among main stages of the algorithm for each process. Left: electric field in direction of laser polarization. A way to address this obstacle is usage of different decomposition schemes. Heart activity simulation The study various arrhythmias development mechanisms. that some problems require not just static decomposition but dynamic load balancing which can also be implemented using rectilinear partitioning scheme. Execution on one NVIDIA Tesla X2070 GPU achieves 3 4 times speedup in comparison to 2x Intel Xeon L5630 (total 8 CPU cores). and 10000 time steps using 512 MPI processes. It is worthy of notice. Computational time for each process on simulation of laser compression using 512 CPU cores. Fig. 21 million particles. The weak scaling efficiency is 90% on 2048 CPU cores relative to 512 cores and 64% on 512 NVIDIA Tesla X2070 GPUs (229 376 CUDA-cores) relative to 16 GPUs. Some problems require simulation with highly nonuniform particle distribution. Fig 2. / Procedia Computer Science 18 (2013) 10 – 19 Fig. As reported in [3] PICADOR strong scaling efficiency is 85% on 2048 CPU cores relative to 512. Results of simulation of laser wakefield acceleration with parameters given in [5] using 1024 CPU cores. Usage of rectilinear decomposition results in 2. According to these results. and preventing ways is extremely important because in economically developed countries cardio-vascular diseases are the world's . Right: data for static rectilinear decomposition. Bastrakov et al. In this case uniform decomposition of computational area between processes results in overall poor performance.5 times speedup. 3.

Then we integrate differential equations describing electric physiology. biologists. Heart is a dynamical system. High degree of heart segmentation allows for gaining reliable spatio-temporal effects. Bastrakov et al. The current values are used as right-hand side while solving the Poisson equation resulting in electric potential in intercellular space. Another important aspect is a database containing virtual electrocardiograms. so the mathematical model of heart can be obtained by analyzing the corresponding mathematical models. We use multidomain heart model (Sachse F. Presented software is using state-of-the-art models based on biological relevant systems of ODEs and is capable of performing large-scale simulation of cardiac dynamics with high precision. Then the mathematical model of heart is a system of a massive number (up to 10 10) of ordinary differential equations. Based on potential at mesh nodes we compute the Laplacian of the field. which is then used to compute external currents for cardiac hystiocytes at mesh nodes. bundle of His and Purkinje's fibers. ion channel conductivities. auricles. Our software supports all models from open CellML storage [24] and has several preset models. Right side shows close to linear scalability on cluster system for grid sizes from 90 x 90 to 600 x 600 (data for grid sizes are highlighted with color). There are several groups making research of heart function [8-16] and creating software for its simulation. This software can be used in research for large-scale simulation of spatio-temporal heart activity. Therefore computer simulation of heart using specified models is only possible on cluster systems. The simulation can be done in sequential and parallel modes. Stewart [22]. It is capable of computing the current state of heart. Each iteration corresponds to ODE integration step using explicit Euler method with adaptive step size. virtual electrocardiogram). Based on that data we build grid and perform heart segmentation into ventricles. Cardiac tissues can be simplistically considered as a medium consisting of self-oscillatory and excitatory cells. We load the heart geometry as tetrahedral finite element mesh. integral heart characteristics (e. shots of spatio-temporal activity. ion currents). et al. etc. Nygren [21]. This database can be used to help diagnostics of cardio-vascular conditions. and processes taking place can be described as an evolution of some state variables (electric membrane potential.B. fibroblasts. each using results of the previous one. Left side shows initiation and propagation of electric (top) and mechanic (bottom) waves of activity. its evolution. Fink Noble [19]. Grandi Bers [9]. Fig 3. Hoper [23]) and account for electromechanical interactions [16].g. Among them the largest killer is so-called sudden cardiac death. Courtemanche [20]. After mesh normalization FEM assembler is initialized to assemble mass matrix and stiffness matrix represented in CSR sparse matrix format. / Procedia Computer Science 18 (2013) 10 – 19 13 largest killer. Simulation consists of several consecutive stages. In this paper we describe software Virtual Heart aimed at simulation of cardiac activity. [17]) and highly realistic biological relevant models for heart segments (Hodgkin Huxley [18].S. sinoatrial and atrioventricular nodes. . The results are saved in vtk format supported by Paraview [25]. The software allows to adjust models to fit specific personal parameters and test treatment methods. Adequate analysis also requires account of mechanic movement of the cardiac tissue resulting in large number of additional equations. presents 2D simulation results on Nizhni Novgorod State University cluster. and integral characteristics such as virtual electrocardiogram. thus enabling the development of new ways to handle arrhythmias. The multi domain model allows for efficient research of dynamic processes control using global and localized external electric stimulation. Electric activity results in a delay of mechanic activity which fits the experimental data. Research of heart arrhythmia is perform by medicals. describing the electric activity of a heart. physicists and experts in mathematical simulation. These equations contain additional term accounting for intercellular interactions which depends on previously computed Laplacian and electric potential. mechanic and drug-induced impacts. and data concerning influence of various electric. The research of heart function is entering a new stage due to the two factors: sufficiently realistic cardiac tissue models based on the newest electrophysiology data and access to high performance systems capable of efficient simulation of complicated spatio-temporal cardiac processes. The input data is preprocessed MRT data. temporal realizations of stresses at normal regime and various arrhythmias.

14 S. It is capable of using PETSc [27].and deoxyhemoglobin one can effectively monitor oxygenation dynamics in the target area [31]. / Procedia Computer Science 18 (2013) 10 – 19 Fig. 4. We employ ray tracing technique based on the Monte Carlo method. An alternative tool is statistical Monte Carlo (MC) approach based on simulation of large number of random photon trajectories and further statistical analysis of the obtained results [33-35]. Recent studies propose application of optical diagnostics for non-invasive brain sensing [30]. however considerable computational load is a significant limitation for its wide use. we need to make simulation of about 10 billion photon trajectories to obtain adequate statistics. E. Classical MCML code is aimed for simulation of infinitely narrow beam propagation in a planar multi- . 3. In order to simulate the operation of the described above optical brain sensing we have to perform simulation of photon propagation in multilayered heterogeneous tissues with complex geometry mimicking human head [33]. takes about 23 hours on a cluster (10 nodes with 2 Intel Xeon L5630 CPUs per node). It has been generally considered that the Monte Carlo method is a gold standard to model light transport in multilayered heterogeneous tissues. Traditional techniques such as magneto-resonant imaging (MRI) or electroencephalography (EEG) are expensive and their application is limited by high requirements to infrastructure. and the solver developed in the University of Nizhni Novgorod [29] to solve sparse systems of linear equations. Monte Carlo simulation of brain sensing by optical diffuse spectroscopy Modern advances in neurosciences require novel diagnostic techniques for non-invasive brain sensing. Bastrakov et al. In this situation numerical methods of light propagation in media with complex geometry can provide a beneficial solution to predict probing light distribution and to determine the measurement volume from which the useful signal is detected. Intel MKL PARDISO [28]. A number of programs to simulate light propagation in complex heterogeneous tissues have been developed by now. Dehghani et al. At the same time. Brain activity is accompanied by blood oxygenation variation in the active area of cortex which results in the change of its local optical properties. have implemented finitedifference approach into software for diffuse optical imaging which resulted in development of widely used NIRFAST code [32]. simulation of 500 million photon trajectories in human head with complex geometry. Stochastic character of multiply scattered photons propagation in human head does not allow an accurate prediction of probing intensity distribution by existing theoretical approaches. For the problem of light propagation in turbid medium it is based on simulation of a large number of random photon trajectories while the shape of each trajectory depends on the optical properties and geometry of the medium. This technique is called optical diffuse spectroscopy (ODS). obtained from MRI data. By illumination of certain cortex area with radiation at the wavelengths corresponding to a maximum difference in absorption of oxy. The existing numerical methods for calculation of light propagation in turbid media can be separated into two classes: finite elements methods and ray tracing techniques.g. Results of simulation using 2D cardiac muscle model for research of electromechanical interactions The software is cross-platform and uses Intel TBB [26] for shared memory systems and MPI for distributed memory systems.

93GHz 4 cores) and 12 GB RAM per node. Benchmarks results are given in Fig. so we are going to raise performance by improving the simulation algorithm and porting the code to GPU. Our system for the simulation of light propagation in complex heterogeneous tissues [35 surfaces which are constructed by triangle meshes. showing that employment of the distributed memory systems can significantly reduce computational time of our algorithm because the scalability is almost linear. For simulation of brain sensing technique we introduced a class of detectors: in our algorithm a detector is a closed region on an outermost surface (or surfaces). Real human head geometry is used for constructing tissue layer boundaries (2 186 446 triangles). during the simulation we store information about the weight of photons that exit the tissue through each detector. The most accurate simulation implies accounting for physiologic geometry (obtained from MRI diagnostic data). (b) Speedup of the simulation code depends on number of cluster nodes for probing wavelengths of 660 and 830 nm. 4. Li et al. (a) Dependence of fraction of probing power registered by the detector on source-detector separation for probing wavelengths of 660 and 830 nm in optical diffuse spectroscopy of brain for planar. Code parallelization efficiency is about 87% in average. For accurate account of interaction of photon with inner boundary within medium it is necessary to use a triangle mesh based model of the inner boundaries. we accumulate trajectories of photons so that for each detector we know the total weight and total trajectory of photons which passed through it. Cluster for testing consists of 128 nodes with 2 CPUs Intel Xeon X5570 (2. spherical and real geometry. From this figure one can see that different approaches provide different results and accounting for the real head model is beneficial. 4b. More accurate simulation can be performed by using spherical head model.S. In our study we employed these three approaches in order to compare results and understand how important is to use complex head model in simulation. 16 million of photon trajectories are simulated. / Procedia Computer Science 18 (2013) 10 – 19 15 layered model medium [33 -beam and a voxelized model of a medium [34]. The easiest way of simulation of brain sensing by ODS is employment of planar geometry which can be performed using classical MCML code. In spite of efficient employment of computational clusters for the simulation the time required to obtain relevant results is still significant. The photon trajectory is calculated until it is within predefined area bounding the considered object. Fig. With the developed code we simulated the dependence of the fraction of probing intensity registered by a detector on source detector separation (Fig. 4a). Bastrakov et al. . for storing photon trajectories data we use rectangular grid in the threedimensional space. Several benchmarks were performed to evaluate the performance of our code. This dependence also allows one to determine possible source-detector distances at which the signal is strong enough to be detected. described a Monte Carlo medium boundaries [36].

As a result. To evaluate the function the integrator employs a kernel with two-dimensional grid of threads that processes all of the elements of the array. For this one employs fast Givens rotations. which is much more accessible for analytical and numerical studies [39-41]. 38]. Interacting particles in 1D arrays A number of key biophysical transport processes (of charge in DNA or protons in photosynthetic proteins) reduce to wave propagation in effectively one-dimensional spatially inhomogeneous arrays. Hamilton operator of the considered model system has the form (1) . as a rule. We arrived at the two-dimensional lattice problem on the data space as given by the twodimensional array of values for wave function amplitudes. particular attention is paid to the problem of two interacting particles. As the problem belongs to the class of SIMD. solving the problem for the eigenvalues and eigenvectors of a sparse matrix (2) and conducting direct numerical integration of the discrete Schrodinger equation for two-particle amplitudes of the probability density (3) Dynamics. Householder reflections. We hypothesized that practical restrictions on the size of arrays that could be computationally studied obscure the true picture of the wave dynamics. Primarily the matrix is converted to the three-diagonal form. For the software technology we selected CUDA. on the same graphics card Tesla M2070 was reduced from one week to one day. and existing work in this area bears more qualitative than strict mathematical character. While it would be challenging to prove delocalizing effect of interactions. there are some graphics cards that can be used in parallel. However. two approaches: analyze their own state of the lattice. the many-particle quantum problem is characterized by exceptional complexity.16 S. in case of non-interacting particles (or a single particle) inhomogeneity leads to Anderson localization of waves and halt of transport [37. it is appropriate to use a graphics accelerator. U force interactions. suitable for high-performance heterogeneous computing and revisit the localization of two-particle states [42]. Remarkably. We used where b+. the current evidence indicated only an increase in the volume of localization compared to single-particle states. The use of parallel programming techniques enabled us to move forward in this issue. As it is wellknown. in CUDA kernel the local tunneling of particles between potential wells can be confined to a single global memory. b are creation and annihilation operators. Molecular dynamics simulation. Eigenvalue problem. Lanczos method. . and implemented a symplectic numerical integration scheme (the pq-method [42]). An additional advantage of the GPU is that on one compute node. The currently open and much studied problem is the dynamics of particles in the presence of interactions. In this context. Bastrakov et al. which reduces the complexity of the algorithms. / Procedia Computer Science 18 (2013) 10 – 19 5. the duration of a typical computational task with the same parameters as in the sequential version. The search for eigenvalues and eigenvectors was implemented by iterative algorithms. Then iterative methods or QR-decomposition are applied. To calculate the spectrum of the matrix of the system (2) and obtain its eigenvectors we used the library SLEPc [43] extension of PETSc library for sparse algebra. We aimed at developing the computationally efficient parallel algorithms.

Computation of eigenvalues of the middle of the spectrum of the matrix produce. Right: diagram of the transport regimes in the interaction-inhomogeneity parameter plane. The scheme is based on FEM computations and allows performing large-scale simulation of cardiac dynamics with high precision. uses the harmonic retrieval. Bastrakov et al. 5. It achieves 85% strong scaling efficiency and 90% weak scaling efficiency on 2048 CPU cores of a cluster system. Renormalization of the value is different for different classes of Fock states. We plan to improve load balance by implementing adaptive domain decomposition. for example. eventually. / Procedia Computer Science 18 (2013) 10 – 19 17 We found that the interaction leads to a renormalization of the self-energies of two-particle states. For 3D simulation we use state-of-the-art models based on biological relevant systems of ODEs and take into account realistic heart geometry. As mentioned above. PICADOR scales well on GPUs: weak scaling efficiency is 64% on 512 NVIDIA Tesla X2070 GPUs relative to 16 GPUs. Further work to address the problem is to be reduced under consideration share the spectrum of the matrix. Our experiments with simulation of laser compression using 512 MPI processes shows speedup of 2. We implemented static rectilinear partitioning scheme for problems with nonuniform particle distribution. which leads to their resonant interaction in a certain range of parameters U. to the delocalization of the eigenstates [42]. the relative position of the delocalized states of the particles is such that the distance between them does not exceed the amount of the single-particle localization. measuring the effective wave packet width (Fig. U=2 and U=4. Solving dynamical equations (3) one confirms unbounded spreading of the wave packet in the predicted parameter region and even obtains a detailed regime map in the parameter space. =2. 6. The task of finding all the eigenvectors is computationally time consuming: using 100 nodes of Lomonosov cluster for N = 100 it takes 137 iterations and 20 hours and for N = 350 57 iterations and 70 hours. Fig. The most . colorbar corresponds to the effective wave packet width [42]. Conclusions PICADOR [3] our implementation of the Particle-in-Cell method for plasma simulation is based on MPI.5 times compared to uniform decomposition scheme. OpenMP and OpenCL parallel programming technologies and is capable of utilizing computational resources of cluster systems with one or several CPUs and GPUs per node. and.S. The most time consuming mathematical kernel is the numerical integration of particles equations of motion. When using the current hardware environment to solve this problem for large matrix is difficult because of the large memory consumption. The simulation of heart activity in 2D case can be parallelized via decomposition of simulation area resulting in close to linear scalability. 5).5.5 respectively. colorbar corresponds to the wave packet local norm in log scale. The key aspect of implementation for GPUs is the clustering of particles into supercells that allows efficient utilization of global and local memory. This allowed us to characterize this state of the system as "correlated metal". Left: from localization to wave packet spreading for two interacting particles.

B37. Molecular dynamics simulation in given problem statement is also time consuming: it takes more than 70 hours using 100 nodes of a cluster. and matrix factorization algorithm once. Brain sensing simulation Will be done by using BVH tree on GPUs. Modern CPUs can execute tens of threads in parallel. For significantly large systems this matrix does not fit memory of a single node. This aspect makes it difficult to use GPU for the considered task. Not required for Monte Carlo simulation. Simulation of brain sensing by optical diffuse spectroscopy consists of two computationally intensive parts: simulation of light propagation and intersection search for photon trajectory and layer boundaries.519. / Procedia Computer Science 18 (2013) 10 – 19 time consuming mathematical kernel is multiple invocations of sparse SLAE solver. Bastrakov et al. Table 1. thus requiring distributed memory systems. In this case a promising way [44] is to use parallel iterative solver on cluster systems. Such memory requirements are acceptable for CPU calculations. . For stationary tasks the matrix of the SLAE does not change. therefore it is important to use appropriate structure for acceleration of the search. RFBR 12-02-31403.2012. so each thread requires a separate memory region. the main candidate for parallelization is the SLAE solver itself.0393. so hundreds of megabytes of memory are required. We present future plans in Table 1. Dynasty foundation.0878. Search of intersections takes about 80% of total computational time. Simulation of light propagation in turbid medium is based on Monte Carlo method which allows to compute large number of random photon trajectories independently. it is a direction of the further research. Supercells method is used. Partially done. Further developments Direction Implementation for GPUs Load balancing on clusters Plasma simulation Done. Could be improved by using relevant SLAE solver inside of the eigensolver.11. 14. 14.2012. MK-1652.2022. Acknowledgements The work has been supported by the following grants: Scientific School #1960.2015 and #11. Memory consumption can be reduced by adjusting parameters of the method. in this case the found eigenvectors may be not mutually orthogonal affecting result interpretation. however for GPUs with thousands parallel threads the required memory size is larger and can reach several gigabytes. We solve the task on cluster system and use SLEPc library.9. Partially done by using scalable eigensolver. Will be done by using FEM mesh decomposition and scalable SLAE solver. The only limiting factor is the size of available memory space.21. For non-stationary tasks using a direct solver would result in multiple invocations of computationally intensive matrix factorization.21. and then calling backward Gauss solver on every step. project #8741. so one could use a direct solver to improve performance. Each photon has its own trajectory. Due to the fact that the right-hand side of SLAE on current step depends on the solution on previous step.11. MK-4028. However. In our algorithm the BVH tree is employed. Heart activity simulation No plans.2. Adaptive decomposition scheme will be implemented. Molecular dynamics simulation Dynamics simulation implemented for GPUs.519. No plans of porting solution of Eigenvalue problem to GPUs.2. It would typically result in applying nested dissection algorithm for matrix reordering to minimize matrix fill-in. Therefore one can obtain good scalability for this problem on both shared and distributed memory systems.B37. Static rectilinear decomposition is used. The most computationally intensive step is finding all eigenvectors of a sparse matrix of a special type. Government Contracts #11. We consider two ways to improve performance: reducing the computed part of the spectrum and using parallel sparse SLAE solver with SLEPc library. One of the ways to reduce memory consumption is the usage of more economical data structures.18 S. Factorization of the matrix requires storing of a dense rotation matrix of the same size.2012.

R. [19] Fink M..L. Journal of Computational Science 3 (2012) 498 503 [36] Li H. 11 (2004) 1029 1038 [37] Anderson P.sourceforge. 112 121. [14] Hooke N. Firek L. 1994. Physiol. [8] Grandi E. J. [2] Birdsal C.W... Benchmarking the codes VORPAL. A. Volume 1086.V. .A.. Boyett M. Filippenko S. [4] Taflove Malova A..cellml.. Commun.O. [18] Hodgkin A. 63. Front..N. Bastrakov et al. Stott J.. JETP Lett. [29] Kozinov E.J. Clark J. MCML Monte Carlo modeling of light transport in multi-layered tissues. Zheng L.. 96 (1-3). Hoffman E.M. 94. Bers DM.H. Circulation Research.P.. V. A novel linear equation solver for sparse systems with a symmetric positive definite matrix.S. Eng. Optical brain imaging in vivo: techniques and applications from animal to man. V. Soc. Phys. 874.N. Gonoskov A. A mouse optical simulation environment (MOSE) to investigate bioluminescent phenomena in the living mouse with the Monte Carlo method. Ann. of the Royal Society. [13] Weiss J. H2854. Express 10 (2002) 159 170 [35] Gorshkov A. Vestnik of [26] Intel Threading Building Blocks. 12 (2007) 051402 [31] Zeff B. Crit. Noble P. 357. P. of the Royal Society. 37.Yu. V. 315-320 (2009) [6] MathGL library... A. 500.V. V. P. Opt.. 474-479. Culver J. Three dimensional Monte Carlo code for photon migration through complex heterogeneous media including the adult human head. Surmin I. Europhys. Maier L. heterogeneous cluster systems. Simulating human cardiac electrophysiology on clinical time scales. P. Circ Res. [44] Niederer S. Boyett M. London (1995).. Smith N. http://hpc-education..M.grycap. P.B.113-128. Diploma Th. Malyshev A. P. e87. Mitchell L. http://www.. pp. Noble P. Noble D. P. 2607 (1994). and QuickPIC with Laser Wakefield Acceleration Simulations. 100. 5(2) (to appear). Puglisi [27] PETSc.R. 1992. Artech House. Zhu F. Opt.V. 30. Trans. Lett. Cong W. Sysoev A. Numer. 3 (2012). P.L. Bers [7] Nicol D.M. AIP Conference Proceedings. [39] Shepelyansky D. Vol.X. Ramirez R. Khomeriki R. 2225.. 82. [16] Ten Tusscher K... V. 117.anl... 2009.. Meyerov I. [20] Courtemanche M.J. Aslanidi O. [41] Krimer D. Henriquez C. 93(11). J. [23] Hoper C. Karagueuzian H..paraview.. [11] Moreno A. P. 108. 2008. 367. Chen P. E. White B.. Biomed. Parallel Distr. [5] Paul K. Comput. [24] CellML.V. H301. / Procedia Computer Science 18 (2013) 10 – 19 19 References [1] UNN HPC Competence Center. Varro A. Physio.. Garfinkel A.. American Journal of Physiology: Heart and Circulatory Physiology. 98..S.html [28] Intel Math Kernel Library.. et al. Rev.R.. 2008 . Circ.R. 2009.. Andre G. P. Circ Res. V. V. Trans. http://software. Zhang H. Acad. 367. Journal of Computational Science. Rev.. http://software. S. 3 133 (1980). Flach S. American Journal of Physiology. 2010.Q. Schlaggar B. Biophys J.. V.. Clark R. Near infrared optical tomography using NIRFAST: Algorithm for numerical model and image reconstruction...A... Wang G. 2011.. Acad. 1492 (1958).. Rudy Y.. Panfilov A..S. Fiset C. Ivanchenko M. 1998.A. Severi S. Plank G. Abildskov J. Retinotopic mapping of adult human visual cortex with high-density diffuse optical tomography. Dunn A. Proc. http://www. 2003. [22] Stewart P. Europhys. http://www. 25 (2009) 711-732 [33] Wang L. V. New York (2005).. Modellierung der Anatomie und Elektrophysiologie des menschlichen Vorhofs. Moreno A..V. Qu Zh. Journal of Molecular and Cellular Cardiology. Jacques S. Tian J. P.. 23: 119-134. V.S.S. Aslanidi O. Seemann G. 36(1). Wang L. Sci.. 41.. Noble D. Phys. Sachse F.109. U. 104 (2007) 12169-12174 [32] Dehghani H. et al. 405 (1995). 2225. Khomeriki R. 275. Zhang H. [9] Grandi E. Lebedev I. Phil. 66002 (2012). [42] Flach S.Res. Plasma Physics via Computer Simulation. Isr. 2007.. Biomed. Noble D. Culver J. J.. 73... [38] Aubry S.. 2000.127.upv. Com.mcs. Prog. OSIRIS.. 48. Lebedev S. P. PubMed ID: 9688927 [21] Nygren A.B. Giles W. V. [10] Livshitz L. 2011. 1103.. Dehghani H.. Eng. 1998. [15] Trayanova N.. Progress in Biophysics and Molecular Biology.. Langdon A. V. P. P. Radiol.. Hren R. 2009. 292.L.... http://models. Rev. Virag L. 2007. Wagner S. V.. [30] Hillman E. Phil. P. Methods Eng. Donchenko R... Taylor & Francis Group. Abildskov J. Huxley A. Natl. V. 406 (2011).V. 87. 47 (1995) 131 146 [34] Boas D. 2007... Meerov I. [25] Paraview.. Nattel S. Lett. Lanzkron P. Proc..C. 2012.S. Rose D. [43] SLEPSc..V. Giles W. Meth..L. Kirillin M. Annals of Biomedical Engineering Ann Biomed Eng. 120.. Biol.. Rectilinear partitioning of irregular data parallel computations. Monte Carlo simulation of brain sensing by optical diffuse spectroscopy. Phys... Lett..J. Particle-in-cell plasma simulation on [3] Bastrakov S. [40] Imry Y. [12] Stewart P. Pasqualini FS.. V. [17] Sachse F.P.R. Biomed.W.. Lindblad D.. B.W. Computational Electrodynamics: The Finite-Difference Time-Domain Method..