 |
 |
 |
 |
The Danforth Center* needed as much computer power as it could buy for $2 million and still fit into its computer room. They opted for a fast, reliable and cost-effective 520-node Intel� architecture-based/Linux cluster. |
|
 |
|
|
|
|
|
|
Danforth: Planting a Cluster
Company Profile
The study of genetics has come a long way since the 1850s and 60s, when Gregor Mendel's experiments with peas led him to postulate the existence of something he called a gene. But, as forward-thinking as Mendel was, he probably never dreamed that science would not only confirm the gene's existence but would also use information gleaned from detailed genomic analysis to dramatically transform everything from agricultural production to disease treatments - and turn biology into big business.
In today's red-hot field of biotechnology, disease resistant crops, personalized cancer treatments and plants that devour environmental toxins are all on the horizon. But it takes computing power - lots of it - to churn through the complex mathematical equations and massive data sets that underlie these efforts.
The Donald Danforth Plant Science Center* in St. Louis, Missouri, is at the forefront of state-of-the-art plant genomics. Established in 1998, the Center is renowned for advanced techniques of computational plant science as well as its leadership in applied experimental efforts.
For example, the Center is home to the International Laboratory for Tropical Agricultural Biotechnology* (ILTAB), which was established in 1991 and is headed by Dr. Roger N. Beachy, the director of the Danforth Center, and Dr. Claude Fauquet.
Among other projects, ILTAB scientists are developing disease-resistant strains of cassava, a root crop that's the third largest source of calories in many developing nations. Half of the 70 million tons of cassava grown in Africa each year are lost to viral plant diseases. Working with the International Center of Tropical Agriculture*, ILTAB scientists hope to create hardier varieties of cassava that will be able to resist the most common viruses - and have a significant impact on the level of hunger on sub-Saharan Africa.
Opportunity
Studying Proteins - And the Foundations of Life
If something happens inside an organism, there's a good chance a protein made it happen. Proteins move substances around, control what gets inside a cell, regulate functions within the cell - they're the work engines of living organisms.
Understand the structure and function of proteins, and you're well on your way to knowing how plant hormones regulate a plant's growth and development, how plants resist bacterial and viral diseases, how they sense and react to increases or decreases in moisture, heat or light, or how they manufacture useful chemicals.
Such information then becomes a jumping-off point for a wide range of potentially groundbreaking uses- from plants that can clean up heavy metals in the environment, to plants that serve as cost-effective "farms" for manufacturing pharmaceutical substances.
At the Danforth Center's Laboratory of Computational Genomics*, a team headed by Dr. Jeffrey Skolnick is developing tools for comparing and interpreting the information that results from genome sequencing, particularly tools for identifying protein function from sequence.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
"Performance was the driving factor on this purchase. I wanted to maximize the throughput on our codes, and the Intel� Pentium� III processors with full-speed cache gave us the best performance."
� Dr. Jeffrey Skolnick, Danforth
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
 |
 |
Protein Predictions
Skolnick's team is developing algorithms that can predict a protein's structure from its genomic sequence, and, with the addition of experimental data, predict the protein's function from the structural information. Using this information, the scientists can help experimentalists prioritize which genes and which proteins merit more detailed study.
That's where the computing power comes in. Gene studies are renowned for their complexity, and that's true even when the subject of study is plants. Case in point: the Arabidopsis, a member of the mustard family, has the smallest known genome size of any flowering plant - a mere 130 million base pairs in length. For more complex plants, the numbers rise rapidly: three billion base pairs for corn, five billion for barley, 16 billion for wheat.
And the sequencing of the gene is just the beginning. In addition to identifying a plant's gene's sequence, scientists also want to understand how genes affect the plant's characteristics. In other words, they want to use that genomic information to better understand the physiology, biochemistry, growth and developmental processes of plants and animals at the molecular level. They want to know how the various components interact, and how they compare to similar components; who they interact with, what processes they're associated with, what they regulate or produce.
So it's no wonder biotechnology research has become one of the leading consumers of high performance computing MIPS (Millions of Instructions Per Second) - or that, when Skolnick went shopping for supercomputers, he wanted to get as much horsepower as his $2M budget would afford.
Solution
Sowing Success with Intel and Linux
At the Danforth Center, researchers are getting the computing power they need in a scaled-out Linux cluster that harnesses 1,040 Intel® Pentium® III processors. With 335 Gflops peak performance, it's believed to be the largest system of its kind devoted to plant science and the most powerful Intel® architecture-based Beowulf cluster ever. And talk about rapid deployment - Danforth's "Kilo Cluster" went from order placement to full production in under six weeks.
When the Danforth Center opened, Skolnick had a mix of Hewlett-Packard* and Silicon Graphics* workstations, but he knew he needed more computing power to tackle the tougher problems his team wanted to understand. He installed a system with 120 Intel� Pentium� III processors at 400 MHz as a testing ground, but still needed more computing power.
Skolnick rejected RISC*-based solutions as being either too big, too slow or too expensive, and decided on a Beowulf cluster running the Linux operating system on the fastest Intel® processors then available, Intel Pentium III processors at 733 MHz and 750 MHz. And he found he could afford a system with 1,040 processors (520 nodes), giving him peak performance of 335 Gflops.
"Performance was the driving factor on this purchase," recalls Skolnick. "I wanted to maximize the throughput on our codes, and the Pentium III processors gave us the best performance. The 733 MHz processors with full-speed cache are giving us 2.0 to 2.3 times the performance per processor than we were getting on the 400 MHz Pentium III processors with half-speed cache. Plus now we have 1,040 processors instead of 120."
System density was also a big issue - literally. "When you're dealing with this many processors, acreage becomes important," Skolnick says. "We had a space budget as well as a dollar budget, and we needed to stay within both figures."
Again, the Intel Pentium III processors filled the bill. Using Intel's advanced 0.18 micron manufacturing technology, the processors and motherboards were only 1/4 the size of some of the alternatives Skolnick considered. The small size allowed more processors to fit into each cabinet, and reduced the heating and floor space requirements.
Reliability and Rapid Time to Market
The Danforth Center chose San Diego, California-based Western Scientific* to custom-build, install and support the system. Western Scientific designs similar machines in a one-off fashion for a wide range of industrial and academic sites.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
"When you have this many system elements, you worry that the law of large numbers is going to get you, but the Kilo Cluster is very reliable. If one node has a problem, the overall service isn't impacted. You just repair it or swap in a spare part."
�Dr. Jeffrey Skolnick, Danforth Center
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
 |
 |
Thanks to Western's expertise in high performance clusters and the use of open, industry-standard building blocks, the system was deployed quickly. Despite having 520 nodes and more than 150 discrete parts, the machine was in full production less than six weeks after the order was signed. (The secret? "Lots of very smart and dedicated employees," according to Jeff Johnson, Western Scientific's VP of Engineering.)
The Kilo Cluster quickly became a production workhorse. "This system just sits there and runs," says Skolnick. "When you have this many system elements, you worry that the law of large numbers is going to get you, but the Kilo Cluster is very reliable. If one node has a problem, the overall service isn't impacted. You just repair it or swap in a spare part."
The use of the Linux operating system also contributes to the machine's cost effectiveness and reliability.
"One of the great things about this system is that it's not designed around proprietary resources, so you're not tied into a single company," says Mike Powell, VP of Sales at Western Scientific. "You have all the resources of the Linux world, plus the open Intel architecture marketplace. If you find a problem with the operating system, chances are somebody's already fixing it. And the cost of deploying Linux on multiple processors is very low."
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
"One of the great things about this system is that it's not designed around proprietary resources, so you're not tied into a single company. You have all the resources of the Linux world, plus the open Intel architecture marketplace."
�Mike Powell, Vice President, Sales, Western Scientific (systems integrator)
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
 |
 |
Finally, Intel architecture gave the Danforth Center a clear price/performance advantage. "The Danforth Center's 520-node system has roughly the same computational power as a Cray* T3E-1200/504, at about one-tenth the cost," says Powell. "It also has roughly a 4:1 price advantage over a comparable system made with Alpha* processors."
Although Western Scientific also builds high performance clusters based on proprietary architectures, Powell says their Intel architecture-based business is booming. "We're doing these systems right and left," Powell says. "This is a much more cost-effective way of doing computing."
Solution Summary

Summary
Bigger Problems Waiting
At the Danforth Center, the Kilo Cluster is making a big difference in the scientists' ability to tackle tough problems. With the power of the Kilo Cluster, scientists can conduct protein studies within hours or days rather than weeks or months, and can conduct more experiments within a given amount of time.
"The Kilo Cluster is vital to our scientists' efforts to increase our understanding of the biology of plants," says Beachy. "This system will make it possible to modify plant metabolism in ways that lead to improved nutritional value, better drought tolerance, and increased insect and disease resistance in crops for the benefit of agriculture and human health. The addition of the Kilo Cluster will enable our scientists to substantially shorten the time required to elucidate protein function."
Like all true computational scientists, however, Skolnick and his team are already looking forward to their next machine. "We couldn't do this work without this much performance, but it's still not enough - it's never enough," says Skolnick. "You solve one problem, and there's always a bigger one waiting in the wings for you."
For now, though, Skolnick says, "I couldn't be happier. This machine lets us do science we couldn't do before. We have the power to fold all the proteins in a serious genome, and that's a real breakthrough."
Bottom line: "If you want to do cutting-edge science, you need good ideas and the tools to implement them," says Skolnick. "The Kilo Cluster gives us the tool."
|