Current IT infrastructure:
The institute has a computational facility involving a widenetwork of roughly 500 machines, comprising workstationsand PCs. A fast ethernet network comprising layer 3 andlayer 2 switches with a mix of copper and fibre connectivityis also provided across the institute. The institute is alsopart of an extensive grid - GARUDA - the national gridinitiative from CDAC. The institute also has a 34 node Linux cluster on Gigabitinterconnect, based on Single core and Single socketcompute nodes.
Some of the key hardware and software being used:
Hardware: Cray X1E, Sunblade 2000, Sun Ultrasparc, HPand Dec-Alpha workstations, Linux servers, Linux cluster.Operating systems: Unicos/MP, Sun Solaris, IBM AIX, HP-UX, several flavors of Linux, Digital Unix 4.0 D and differentversions of Windows.Compilers & Applications software - C and Fortran compilersfrom various vendors, ANSYS, NAG, NCAR, MATLAB,IMSL, Serenade, Ansoft, Tornado, IDL, CICA, Visions etc.
IPR wanted to deploy a 32 node cluster based in IntelCore Microarchitecture. The cluster was to have low latency,high bandwidth 4x DDR Infiniband with a bandwidth of 20gbps to be set up. The vendor was to deliver end to endcluster infrastructure based on two CPU-based Intel Serversand 96 port Infiniband Switch. This cluster had to bescalable to a larger one in the future and hence the switchinginfrastructure was to be deployed with greater expandability.Besides the hardware, the cluster provider must alsoprovide with the necessary Cluster software suite thatcomprises of Cluster Managers, Schedulers, Debuggers,MPIs and other Libraries and Cluster tools that couldenhance the performance and make the cluster easilymanageable. The applications were primarily involving the following areasof work;1. Computational Fluid Dynamics2. Electrodynamics3. Hydrodynamics4. Differential Equations5. Linear Matrices
The challenges at the customer end were;1. Some of the existing codes that use MPI and whenclustered using Ethernet the scalabilty had a ceiling at 16CPUs. This needed to be scaled up using a Low Latency,High Bandwidth Interconnect and hence Infiniband is onekey choice.2. Since the existing cluster was based on Single Core,Single Socket nodes, the larger problem sizes and codestook longer time and thus turn around times (TAT) werelarger and hence the need for a cluster that will be fasterso that TAT could be reduced.3. Some of the applications are such that a linear scale upfrom Single Core Single Socket to Dual Core Dual Socketcannot be expected.4. The efficiency of the cluster has been targeted by thecustomer to be close to 70% to Rpeak (Theoretical GFLOPperformance)5. Some of the open source cluster tools such as Rocksand OSCAR cannot be scaled to larger clusters withoutcustomization and tuning.Besides the above challenge in the application front, thekey challenge was to win the solution against the likes of IBM, HP and Dell in an Open tender situation.