![]() |
Clusters are the fruit of decades of labor in the HPC community striving towards two goals increased performance and lowered cost. Our primary tool in seeking ever-higher pinnacles of performance has been maximizing the use of parallelism. The most effective fundamental strategy to drive down the cost of HPC computing has been the employment of standardized, high-volume, and near-commodity technology wherever possible.
Performance
The earliest mainstream uses of parallelism in HPC, vector machines and high-performance SMP and MPP systems, were expensive compared to general purpose computing systems but able to reach the heights of performance sought by users. Standard, high-volume technologies are more capable today, allowing excellent levels of performance to be attained without resorting to the exotic and expensive approaches of the past.
The prior machines depended upon purpose-built circuits, communications paths and memories, appropriate given the cost-is-no-object mentality of the race for pure performance, but elevating these machines out of the reach of many potential users. The broader market for HPC also has an interest in more affordable costs.
Cost
Clusters, a solution built with a maximum of cost-effective, standard building blocks, and a minimum of expensive or custom components, provides parallelism in much the same way as the MPP machine. In addition, it makes use of widely available general networking technology for the communications paths between processors.
Clusters are now gaining a bigger share of the HPC market, being used where previously only a more expensive alternative like MPP would be required. This new found popularity stems from several sources. First, continued invention of more parallel algorithms, and diffusion into the market of more programs implementing them, have increased the problems for which parallel acceleration is possible. Second, standard high-volume networking technologies have become very high performance, supporting effective scaling for parallel execution for more classes of problems than in the past. Third, innovations have been applied to clustering that increase scaling, reduce delays and improve manageability. The sum of these trends yields a far wider applicability for clusters in the HPC market than had existed previously.
Would all clusters be considered equal? Do users have options when implementing cluster solutions?
Processors
Node Size
Networking Requirements
Application Type
Memory Accessibility
However, a relatively modern technique for sharing memory called Non-Uniform Memory Access (NUMA) broke through the SMP barriers, allowing larger numbers of nodes to share a single memory. These NUMA-based shared memory configurations may be the right solution for classes of applications whose scaling would be limited on shared-nothing designs.
Even with clusters having shared memory, there can be differences in how it is implemented. One type may have separate memory in each node, with a separate copy of the operating system running on each node. Others may implement a fully shared, common memory for all nodes.
One class of application that may demand a full shared memory model would have unpredictable, wide-ranging and high-intensity access to data. Because the shared memory system executes just a single copy of the operating system across all the nodes, the in-memory buffer pool is accessible by all. As soon as a node updates some part of the data, that changed data is held in the shared buffer for all to read. Rather than having each node issue a separate read to place the data in the buffer pool in its memory, one read by any node places the data where every node can then access it in memory.
Programming Approaches
Software Availability
So, no, all clusters are not equal. A business should look closely at its needs, the intended applications to be run, and the specific characteristics of the candidate systems before choosing one to purchase.
What kind of challenges might a user experience with different approaches to clusters?
A cluster whose design or software complement does not match the tools, programming languages, programming interfaces or other aspects of the existing application, may require serious changes to support a given application. If the application is procured from a software vendor, it may not even be possible to get the application to run on the existing cluster machine you own. For example, an off-the-shelf application compiled to run on SPARC processors will not execute without recompilation on Itanium based machines.
A large cluster installation running a complex mix of different applications requires very good tools to manage them, to ensure good service for the applications and high utilization of the entire complex. If the tools available with a given cluster system are inadequate, users may experience frequent delays, aborted runs, erratic operation, extended outages and many idle nodes. Attempting to substitute people to overcome the defects of the management software can be a very expensive band-aid.
Because the cluster design can rule out certain programming approaches, users may have codes that are almost unsalvageable on those clusters, due to the radical changes that would be necessary.
If a cluster forces a business to adopt new operating systems, middleware, or programming languages, the impact can be high. Productivity goes down, training requirements push aside productive tasks, errors increase due to inexperience, and costs often balloon as well.
How do you see clusters evolving in the next 5 years?
The high-volume, standardized, near-commodity networking technologies will continue to grow more performant, permitting those applications that today are limited to MPP machines to run well on more cost-effective clusters. A few years progress following Moore's and Gilder's laws (that the capabilities of chips and bandwidth, respectively, double in specific short periods) will give us extremely fast and wide pipes to connect the many nodes in clusters.
Scaling approaches across general networks to machines with multiple owners, grid in other words, will drive improvements in management and control software that will be immediately applicable to clusters as well. Since clusters will not involve the issues of priority, funding, security and so forth that occur in a grid setting, clusters will remain a more widespread solution than grid.
The market momentum of Linux®, already the most popular OS for cluster systems, will bring with it a larger pool of applications that can be run on clusters with little or no modification.
The processors themselves and the memory technology in the nodes will also compound as suggested by Moore's law, allowing the same sized task to be accomplished across fewer and fewer nodes as those nodes gets faster. Clusters with modest numbers of nodes will provide staggering levels of computer power, while the scalability of cluster technology with those fast future nodes will reach new Olympian peaks.
Source: Gartner
©2004 Silicon Graphics, Inc. All rights reserved. Silicon Graphics, SGI, Altix, the SGI logo and the SGI cube are registered
trademarks and The Source of Innovation and Discovery are trademarks of Silicon Graphics, Inc., in the U.S. and/or other
countries worldwide. Linux is a registered trademark of Linus Torvalds in several countries. Intel and Itanium are
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All
other trademarks mentioned herein are the property of their respective owners.
Technology Insight is published by SGI. Editorial supplied by SGI is independent of Gartner analysis. All Gartner research is © 2004 by Gartner, Inc. and/or its Affiliates. All rights reserved. All Gartner materials are used with Gartner's permission and in no way does the use or publication of Gartner research indicate Gartner's endorsement of SGI's products and/or strategies. Reproduction of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.
|
Inside this issue Trends in Cluster Computing, Exclusive Gartner Analyst Interview with Carl Claunch
Cluster Architectures for Case Study: Large-node Clusters for Improved Product and Process Deployment |