Professional Documents
Culture Documents
A.MAJOR FEATURES A cheap but modern desktop computer using, for example, a
Pentium 4 or Athlon 64 CPU, typically runs at a clock
The Blue Gene/L supercomputer was unique in the following frequency in excess of 2 GHz and provides computational
aspects: performance in the range of a few GFLOPS. Even some video
game consoles of the late 1990s' vintage, such as the
• Trading the speed of processors for lower power Gamecube and Dreamcast had performance in excess of one
consumption. Blue Gene/L used low frequency and GFLOPS .
low power embedded PowerPC cores with floating
point accelerators. While the performance of each The original supercomputer, the Cray-1, was set up at Los
chip was relatively low, the system could achieve Alamos National Laboratory in 1976. The Cray-1 was capable
better performance to energy ratio, for applications of 80 MFLOPS (or, according to another source, 138–250
that could use larger numbers of nodes. MFLOPS). In fewer than 30 years since then, the
• Dual processors per node with two working modes: computational speed of supercomputers has jumped a
co-processor mode where one processor handles millionfold.
computation and the other handles communication;
and virtual-node mode, where both processors are The fastest computer in the world as of November 5, 2004, the
available to run user code, but the processors share IBM Blue Gene supercomputer, measures 70.72 TFLOPS.
both the computation and the communication load. This supercomputer was a prototype of the Blue Gene/L
• System-on-a-chip design. All node components machine IBM is building for the Lawrence Livermore
were embedded on one chip, with the exception of National Laboratory in California. During a speed test on 24
512 MB external DRAM. March 2005, it was rated at 135.5 TFLOPS. Blue Gene's new
• A large number of nodes (scalable in increments of record was achieved by doubling the number of current racks
1024 up to at least 65,536) Three-dimensional to 32. Each rack holds 1,024 processors, yet the chips are the
torus interconnect with auxiliary networks for global same as those found in high-end computers. The complete
communications (broadcast and reductions), I/O, version will have a total of 64 racks and a theoretical speed
and management . measured at 360 TFLOPS.
• Lightweight OS per node for minimum system
overhead (system noise). II. ARCHITECTURE DETAILS
V. APPLICATION
VII. REFERENCES
• www.research.ibm.com/bluegene
• www.research.ibm/journals/sj/402/allen.html
• www.studymafia.org
• www.bio-itworld.com/news/071503-
report2898.html
• www.linuxdevices.com
• Adiga et al., (2002). An Overview of the BG/L
Supercomputer. Proceedings of the 2002
Supercomputing Conference www.scconference.
org/sc2002/
• Benveniste, C. and Heidelberger, P. (1995). Parallel
Simulation of the IBM Interconnection Network. In
Proceedings of the 1995 Winter Simulation
Conference. IEEE Computer Society Press, 584 –
589.