The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign opened in 1986 as one of the five original National Science Foundation (NSF) Supercomputer Centers. During the decade covered by that program, the center earned an international reputation for innovative applications in high-performance computing, visualization, and desktop software. NCSA greatly broadened the user base of remote supercomputing and the Internet by developing the cross-platform software tool NCSA Telnet in 1987. In 1992, the center developed NCSA Mosaic, the first easily-available graphical Web browser that helped launch the explosive growth of the Web in the 1990s. Their current work on teraflop-scale Linux* clusters and future endeavors with distributed computing grids continues a long tradition of NCSA's groundbreaking work with information technologies.
Home to some of the fastest computers in the world, the NCSA at the University of Illinois in Champaign, Illinois, is the lead institution for the National Computational Science Alliance (the Alliance), which is funded by the National Science Foundation. The NCSA's mission is to provide computing cycles and software support for the science and engineering community. "That means we work with them to tune, optimize and deploy applications on high-end computing facilities to manage their data, and to provide access to those facilities through high-performance networks," says Dan Reed, director of the NCSA. "As part of that effort, we build and deploy a lot of technology, ranging from scientific visualization in numerical software, to networking optimization tools and librariesthe whole spectrum of software support for high-performance computing."
What really happens when black holes collide? What is the nature of the force binding nuclei? What is the likelihood that a storm front over Oklahoma will spawn tornadoes? From basic research to weather questions and aerodynamic problems for airplane manufacturers, science and industry are looking for answers that can only come from years of computing time using the world's fastest computers.
According to Dan Reed, "There's no upper bound on the level of performance that today's researchers want and need. If we had a petaflops (a machine capable of 1,000 trillion floating-point operations per second) on the floor, researchers would clamor for access, and it would be saturated in a few days."
While petaflops machines are still in the future, teraflops (one trillion operations per second) systems exist now. If you look into the teraflops systems at NCSA you will see that the Linux operating system is very much a part of today's technology for terascale computing. NCSA's users were asking for terascale capabilities, and they specifically wanted to take advantage of the Linux operating system for running a variety of community-developed research codes.
The (Previously) High Price of Terascale Computing
In order to deliver close to a trillion operations per second, NCSA had been relying on special purpose systems from SGI and Cray. These extremely High-Performance Computing (HPC) machines commanded a commensurately high price, utilizing proprietary processors and specialized subsystems to achieve their impressive performance. And while the next generation of HPC machines promised improved performance, price was becoming prohibitive, especially in light of the price/performance of Intel®-based platforms. With the introduction of the 64-bit Intel® Itanium® processor, the NCSA had a clear route to the future of terascale computing via Linux clusters. In simplistic terms, a Linux cluster is an aggregation of a large number of smaller Linux machines into a single more powerful parallel system. NCSA was eager to acquire Itanium-based workstations to begin porting their very complex scientific applications to fully exploit the new platform. This second cluster of workstations was designed to augment their current Linux cluster of 512 dual Intel® Pentium® III processor 32-bit machines.
To upgrade its computing capacity to keep pace with users' needs, NCSA decided to work with IBM in creating the largest Linux cluster in the academic world. The supercomputing center was satisfied that clustering industry-standard Intel processor-based hardware was the most economical and space-efficient way to create a terascale computing system, having successfully created smaller clusters. More importantly, some of the early users of NCSA's first Linux cluster had already proved the capabilities of this type of aggregated system.
Solution Configuration and Implementation
Early this spring, a team from IBM designed and assembled a new NCSA Linux cluster solution using 512 IBM eServer xSeries x330* thin servers each with two 32-bit Intel Pentium III processors running Red Hat* Linux. This cluster solution currently ranks among the ten top supercomputers in the world. During the summer of this year, a second cluster was constructed using IBM Intellistation* Z Pro 6894 systems, making the combined clusters capable of a peak theoretical performance of two teraflops. During the course of this implementation the NCSA found that these clusters cost just 14 percent to 33 percent the price of supercomputers built with special-purpose hardware.
The individual system configuration (or "node" as they are known in cluster parlance) of the second Intel Itanium-based platform cluster comprises 160 IBM IntelliStation Z Pro 6894 workstations running Red Hat Linux 7.2 (Seawolf). Each IntelliStation features two 64-bit Intel Itanium processors at 800MHz, 4MB L2 cache, 2GB ECC SDRAM main memory (16GB capable), an 18.2GB1 Ultra160 SCSI hard disk drive for the Linux operating system, and two 36.4GB Ultra160 SCSI hard drives for local application scratch space on the cluster. In addition to Ethernet networks, both Linux clusters at NCSA are interconnected with a high speed switch for inter-processor communication within the cluster, manufactured by IBM Business Partner Myricom, Inc. Myrinet* is a low-latency cluster network communication system that enables all the computer nodes of the cluster to behave as a single supercomputer, passing messages back and forth and performing distributed computations using the Message Passing Interface (MPI) environment. MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and academic computing users.
IBM Global Services (IGS) procured the IBM and OEM hardware and did some preassembly in racks at the IBM Customer Support Center in Rochester, Minnesota. IBM Linux cluster specialists provided on-site NCSA integration of both clusters and built up the initial Linux software environment using special purpose software called xCAT (xSeries Cluster Administration Tools). IGS is also providing ongoing cluster system management and a two-year Linux support line contract. The IBM Netfinity* Advanced Technology System group provided the initial architecture and configuration requirements for the Pentium processor-based cluster, authoring and subsequently modifying the xCAT software for this project on the Itanium-based platform Linux cluster. (See IBM Redbook Linux HPC Cluster Installation for details on xCAT.)
"IBM clearly has a major commitment to Linux, and that's why we chose to work with them," says Rob Pennington, associate director, computing and communications division, NCSA. "IBM has been in the Linux space for a while now, working very hard and very publicly on the Linux projects. The IBM professionals we have been working with understand the issues of high-performance computing, and they understand the importance of Linux," notes Pennington. "IBM Global Services has been able to deliver a total solution, literally, right onto our machine room floor. The architecture, the process of bringing all the components together, and the assembly have been transparent to us."
Key Benefits of an Intel Itanium Processor Based Solution
Clusters of high-performance systems such as IBM IntelliStation workstations and xSeries servers offer multiple benefits for supercomputing centers such as NCSA, including high memory addressability, increased performance, ease of incrementally scaling clusters, and a choice of enterprise operating systems, including Linux. "The price/performance ratio is very attractive," says Pennington. "[Intel] processors are as fast as many of the special-purpose processors. Also, the effective lifetime of the systems on our machine-room floor is only about two years. Every other year or so we turn over the systems and replace them. It gets extremely expensive to do this with special-purpose hardware. With [industry-standard] systems, we can do incremental upgrades as time passes and faster processors and better configurations come out. So the transition from phase to phase with [Intel] systems is much easier, and we're better positioned for the future."
How NCSA Offers Cluster Computing to the Scientific Community
To obtain access to the NCSA facilities, research teams submit applications describing the project they want to pursue. A national review board awards access based on peer reviews. Once they receive approval, users log in over the Internet and begin their development work, compiling code, debugging, and testing it. Then they submit a job request to the batch queuing system that accesses the cluster machines. The queuing system allocates the work, initiates the run, routes the output data back to the storage system, and finishes the job.
The challenge in extracting high performance from clusters of systems is in distributing the applications so that they run efficiently across large numbers of processors. "You want to partition computational work in ways that the pieces can be coordinated and not be dominated by a small number of bottlenecks," says Reed. "As you move to a very large scale, that is a major challenge. It requires collaboration among domain scientists and engineers, as well as computer scientists, to look at how the characteristics of the application interact with hardware, and what kind of support tools are necessary to achieve high performance." IBM Global Services has an engineer on site to help fine-tune the Itanium-based system cluster's performance and observe characteristics of the large-scale cluster that will help in future development.
Performance Results of Itanium Processor-Powered IntelliStation Clusters
NCSA's cluster development team worked with the Intel Itanium architecture for several months in collaboration with IBM, Microsoft, and Intel Corporation. A team of researchers at the NCSA ported four high-performance research codes to Intel's Itanium processor. The four codes are: Cactus; MILC (for MIMD Lattice Computation); sPPM (for simplified Piecewise Parabolic Method); and a version of the General Atomic and Molecular Electronic Structure System (GAMESS). The codes have already achieved record performance levels on an Itanium-based system.
Preliminary results show that Cactus, MILC and sPPM have all achieved in excess of 650 megaflops on a single processor in a prototype Itanium-based system running 64-bit Linux. Cactus is a parallel toolkit used in astrophysics and several other scientific disciplines. MILC is a code used by a nationwide group of physicists at nine academic institutions who study some of the most fundamental questions of the universe. sPPM is used primarily in astrophysics and defense applications.
Astrophysicist Ed Seidel and the Cactus team at the Max Planck Institute for Gravitational Physics in Potsdam, Germany, have run Cactus on a number of systems and have seen its best performance on the Itanium-based system. "Our very complex calculations for black hole collisions can be carried out faster and on larger scales than ever before, leading to the more accurate predictions of gravitational wave signals needed to better understand Einstein's theories," said Seidel.
Steven Gottlieb, a member of the MILC collaboration at Indiana University, referring to the 64 lattice results, added, "The early performance levels are a good sign for the nationwide MILC collaboration. We are working on one of the most numerically demanding applications in scientific computation. We are quite pleased with these early results."
Paul Woodward, a National Computational Science Alliance researcher at the University of Minnesota who uses sPPM on NCSA computing systems, said he has run sPPM on a variety of systems, and the Itanium-based system outperforms them all. "My group has tested the performance of the sPPM code on a wide variety of microprocessors and the Itanium processor is significantly better than any other processor we've tested," said Woodward. "We have a highly scalable code and we expect performance on large Itanium processor-based clusters to be excellent."
The last word
"Seeing this level of performance on pilot systems shows that our collaborative efforts are already paying off," said Rob Pennington. "These early positive results confirm what we have been anticipating from the beginning: The Intel Itanium processor is going to be one of the key platforms in high-performance computing."