<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=unicode">
<meta name=Generator content="Microsoft Word 15 (filtered)">
<title>Single-threaded vs. Multi-threaded</title>
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:Wingdings;
	panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
h2
	{mso-style-link:"Heading 2 Char";
	margin-right:0in;
	margin-left:0in;
	font-size:18.0pt;
	font-family:"Times New Roman",serif;}
p.MsoHeader, li.MsoHeader, div.MsoHeader
	{mso-style-link:"Header Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.MsoFooter, li.MsoFooter, div.MsoFooter
	{mso-style-link:"Footer Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
	{color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:#954F72;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst
	{margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle
	{margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast
	{margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
	{mso-style-name:msonormal;
	margin-right:0in;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
span.Heading2Char
	{mso-style-name:"Heading 2 Char";
	mso-style-link:"Heading 2";
	font-family:"Calibri Light",sans-serif;
	color:#2F5496;}
span.HeaderChar
	{mso-style-name:"Header Char";
	mso-style-link:Header;
	font-family:"Times New Roman",serif;}
span.FooterChar
	{mso-style-name:"Footer Char";
	mso-style-link:Footer;
	font-family:"Times New Roman",serif;}
.MsoChpDefault
	{font-size:10.0pt;}
 /* Page Definitions */
 @page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
 /* List Definitions */
 ol
	{margin-bottom:0in;}
ul
	{margin-bottom:0in;}
-->
</style>

</head>

<body bgcolor=white lang=EN-US link="#0563C1" vlink="#954F72">

<div class=WordSection1>

<h2>Single-threaded vs. Multi-threaded position statements</h2>

<h2><span style='font-size:12.0pt'>Moderator: </span><span style='font-size:
12.0pt;font-weight:normal'>Joel Emer (MIT and NVIDIA)</span></h2>

<h2><span style='font-size:12.0pt'>Panelists: </span><span style='font-size:
12.0pt;font-weight:normal'>Yale Patt (University of Texas, Austin), Mark Hill (University
of Wisconsin, Madison)</span></h2>

<ul type=disc>
 <li class=MsoNormal><b>Joel Emer</b> - Often in the past, as many of you know,
     the Iron Law of performance that related total number of instructions,
     number of instructions per cycle and cycle time was used to characterize
     and contrast the performance of different systems. The factors in that equation
     essentially emphasized the number of instructions being processed at once
     and the latency of those instructions. In the uniprocessor era, tradeoffs
     among those factors led to a variety of architectures, where some systems
     focused on more instructions in flight and others on improved latency. The
     demise of Denard scaling, added power as a major consideration to the
     means by which those objectives were achieved. Among other things, that
     resulted in an increased focus on multi-core systems, which provided more
     instructions in flight with less overhead. But the more constrained forms
     of parallelism of such systems also carried efficiency challenges and a
     diminished domain of applicability (both intrinsic and due to
     programmability difficulties). The diminishing of Denard scaling also led
     to the first incarnation of this panel in 2007 where the panelists debated
     the right direction for architecture research. Now, the demise of Moore's
     Law is bringing an increased focus on the most effective use of every transistor
     on a die. The popular Roofline model also now switches the focus from
     instructions to operations, which are the essential constituent of
     computation. So again we need to consider the tradeoffs among architecture
     choices that vary in the number of operations that can be in flight, the
     transistor and energy costs of launching those operations and the
     efficiency (and programmability) of a particular design choice across a
     range of applications. To shed light on these choices, we bring back the
     2007 panelists to debate the right directions for the future...</li>
</ul>

<p class=MsoNormal style='margin-left:.5in'>&nbsp;</p>

<ul type=disc>
 <li class=MsoNormal><b>Yale Patt</b> - A lot of multicore is frankly silly,
     and there is no way the software will take advantage of it.&nbsp; For
     those places where the software can exploit many cores, single-thread is
     EVEN MORE important than 12 years ago, even though the number of cores has
     increased a lot, thanks to Moore's Law not yet dead.<br>
     <br>
     Obvious places are Amdahl's Law, critical sections, and lagging threads,
     for example.&nbsp; But as long as we do not encourage students to work on
     concepts that will improve single thread, and there are plenty of avenues
     to address, including my continual rant on breaking through the levels of
     transformation, recently captured also by Charles Lieserson, et al.'s
     &quot;Plenty of Room at the Top,&quot; we will continue to see single
     thread performance not improve.</li>
</ul>

<p class=MsoNormal>&nbsp;</p>

<ul type=disc>
 <li class=MsoNormal><b>Mark Hill</b> - The history of computer architecture is
     that of harnessing more (and faster) transistors (Moore’s Law) in parallel
     to yield improved performance at similar cost and power. The 20th Century
     saw tremendous single-threaded performance gains though increasing
     instruction-level parallelism, even at a quadratic cost in area and/or
     power (Pollack's Rule). 20th Century ILP successes relegated other
     creative techniques, such as data-level parallelism (SIMD, vectors) and
     thread-level parallelism, to niche successes. The 21st Century is and will
     continue to be different, because the substantial demise of transistor
     power scaling (Dennard Scaling) renders further exploitation of Pollack’s
     Rule untenable. The 2000s saw a turn to thread-level parallelism with
     multicore chips, the 2010s are seeing expanded use of data-level
     parallelism with general-purpose GPU Single Instruction Multiple Thread
     (SIMT), and—I predict—that the 2020s will see wide exploitation accelerator-level
     parallelism wherein multiple accelerators get used concurrently, as
     already happens today with smartphone Systems on a Chip (SoCs).</li>
</ul>

<p class=MsoNormal style='margin-left:.25in'>&nbsp;</p>

<p class=MsoNormal style='margin-left:.25in'><a
href="https://www.ele.uri.edu/CARD">Back to CARD webpage</a></p>

</div>

</body>

</html>