Single-threaded vs. Multi-threaded position statements
Moderator: Joel Emer (MIT and NVIDIA)
Panelists: Yale Patt (University of Texas, Austin), Mark Hill (University
of Wisconsin, Madison)
- Joel Emer - Often in the past, as many of you know,
the Iron Law of performance that related total number of instructions,
number of instructions per cycle and cycle time was used to characterize
and contrast the performance of different systems. The factors in that equation
essentially emphasized the number of instructions being processed at once
and the latency of those instructions. In the uniprocessor era, tradeoffs
among those factors led to a variety of architectures, where some systems
focused on more instructions in flight and others on improved latency. The
demise of Denard scaling, added power as a major consideration to the
means by which those objectives were achieved. Among other things, that
resulted in an increased focus on multi-core systems, which provided more
instructions in flight with less overhead. But the more constrained forms
of parallelism of such systems also carried efficiency challenges and a
diminished domain of applicability (both intrinsic and due to
programmability difficulties). The diminishing of Denard scaling also led
to the first incarnation of this panel in 2007 where the panelists debated
the right direction for architecture research. Now, the demise of Moore's
Law is bringing an increased focus on the most effective use of every transistor
on a die. The popular Roofline model also now switches the focus from
instructions to operations, which are the essential constituent of
computation. So again we need to consider the tradeoffs among architecture
choices that vary in the number of operations that can be in flight, the
transistor and energy costs of launching those operations and the
efficiency (and programmability) of a particular design choice across a
range of applications. To shed light on these choices, we bring back the
2007 panelists to debate the right directions for the future...
- Yale Patt - A lot of multicore is frankly silly,
and there is no way the software will take advantage of it. For
those places where the software can exploit many cores, single-thread is
EVEN MORE important than 12 years ago, even though the number of cores has
increased a lot, thanks to Moore's Law not yet dead.
Obvious places are Amdahl's Law, critical sections, and lagging threads,
for example. But as long as we do not encourage students to work on
concepts that will improve single thread, and there are plenty of avenues
to address, including my continual rant on breaking through the levels of
transformation, recently captured also by Charles Lieserson, et al.'s
"Plenty of Room at the Top," we will continue to see single
thread performance not improve.
- Mark Hill - The history of computer architecture is
that of harnessing more (and faster) transistors (Moore’s Law) in parallel
to yield improved performance at similar cost and power. The 20th Century
saw tremendous single-threaded performance gains though increasing
instruction-level parallelism, even at a quadratic cost in area and/or
power (Pollack's Rule). 20th Century ILP successes relegated other
creative techniques, such as data-level parallelism (SIMD, vectors) and
thread-level parallelism, to niche successes. The 21st Century is and will
continue to be different, because the substantial demise of transistor
power scaling (Dennard Scaling) renders further exploitation of Pollack’s
Rule untenable. The 2000s saw a turn to thread-level parallelism with
multicore chips, the 2010s are seeing expanded use of data-level
parallelism with general-purpose GPU Single Instruction Multiple Thread
(SIMT), and—I predict—that the 2020s will see wide exploitation accelerator-level
parallelism wherein multiple accelerators get used concurrently, as
already happens today with smartphone Systems on a Chip (SoCs).
Back to CARD webpage