Challenges towards Exascale Computing

GPU Computing and the Road to Extreme-Scale Parallel Systems

Steve Keckler (NVIDIA)


While Moore's Law has continued to provide smaller semiconductor
devices, the effective end of uniprocessor performance scaling has
(finally) instigated mainstream computing to adopt parallel hardware
and software. Based on their derivation from high-performance
programmable graphics architectures, modern GPUs have emerged as the
world's most successful parallel architecture. Today, a single GPU has
a peak performance of over 500 GFlops and 148 GBytes/second of memory
bandwidth. The combination of high compute density and energy
efficiency (GFlops/Watt) has motivated the world's fastest
supercomputers to employ GPUs, including 3 of the top 4 on the
November 2010 Top 500 list.  This presentation will first describe the
fundamentals of contemporary GPU architectures and the
high-performance systems that are built around them. I will then
highlight three substantial challenges that face the design of future
parallel computing systems on the road to Exascale: (1) the power
wall, (2) the bandwidth wall, and (3) the programming wall. Finally, I
will describe NVIDIA's Echelon research project that is developing
architectures and programming systems that aim to address these
challenges and drive continued performance scaling of parallel
computing from embedded systems to supercomputers.


Steve Keckler joined NVIDIA in December 2009 where he serves as Director of
Architecture Research.  He is also Professor of both Computer Science
and Electrical and Computer Engineering at the University of Texas at
Austin, where he has served on the faculty since 1998. His research
team at UT-Austin developed scalable parallel processor and memory
system architectures, including non-uniform cache architectures;
explicit data graph execution processors, which merge dataflow
execution with sequential memory semantics; and micro-interconnection
networks to implement distributed processor protocols. All of these
technologies were demonstrated in the TRIPS experimental computer
system. Keckler was previously at the Massachusetts Institute of
Technology from 1990 to 1998, where he led the development of the
M-Machine experimental parallel computer system. He is a Fellow of the
IEEE, an Alfred P. Sloan Research Fellow and a recipient of the NSF
CAREER award, the ACM Grace Murray Hopper award, the President's
Associates Teaching Excellence Award at UT-Austin, and the Edith and
Peter O'Donnell award for Engineering. He earned a BS in Electrical
Engineering from Stanford University and an MS and a Ph.D. in Computer
Science from the Massachusetts Institute of Technology.