AI Lab logo
menu MENU

Faculty Candidate Seminar

Merrimac – High-Performance Highly-Efficient Scientific Computing with Streams

Dr. Mattan Erez
SHARE:

Dr. Erez is from Stanford University
Advances in VLSI technology have made the raw ingredients for computation plentiful. Large amounts of fast functional units, memory, and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance computers realize only a small fraction of VLSI's potential. In this talk I will describe the Merrimac streaming supercomputer, which is being developed with an integrated view of the applications, software system, compiler, and architecture. I will show how this approach leads to an order of magnitude gain in performance per unit cost, unit power, and unit floor-space for scientific applications compared to common scientific computers designed around clusters of conventional CPUs. The talk will cover Merrimac's stream architecture, mapping scientific applications to effectively run on the stream architecture, and system issues in the Merrimac supercomputer.

The stream architecture is designed to take advantage of the properties of modern semiconductor technology — very high bandwidth over short distances and very high transistor counts, but limited global on-chip and off-chip bandwidths — and match them with the characteristics of scientific codes — large amounts of parallelism and access locality. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative computations by an order of magnitude or more. Hence a processing node with a fixed memory bandwidth (which is expensive) can support an order of magnitude more arithmetic units (which are inexpensive). Because each node has much greater performance (128 double-precision GFLOPs in our current design) than a conventional microprocessor, a streaming supercomputer can achieve a given level of performance with fewer nodes, reducing costs, simplifying system management, and increasing reliability.

Sponsored by

CSE Division