Research of A.K. Uht
Project Areas
In this area I am mainly concerned with the improvement of uniprocessor performance on branch-intensive (general purpose) code through microarchitectural enhancements, especially the exploitation of large amounts of Instruction-Level Parallelism (the concurrent execution of machine instructions). The work is applicable to all processors, including Intel x86 architectures. Two key enabling ideas are Disjoint Eager Execution and Minimal Control Dependencies. Used together, they exhibit ILP's of 10's of instructions per cycle. They may also be used to realize high IPC's. (See below.) - [main paper] - "Levo" is our prototype high-IPC uniprocessor computer. It uses a Resource Flow Execution model and realizes Disjoint Eager Execution and Minimal Control Dependencies, and will use somewhere around 32-64 processing elements to achieve the execution of 10's of instructions per cycle (target). In recent simulations, a Levo model with realistic resources, including a high-latency main memory, exhibited an harmonic mean IPC of about 5-6 on 10 SPECint benchmarks, including gcc. This was joint work with Prof. Kaeli's NUCAR group at Northeastern University. We are currently designing a flexible prototype for both future architectural studies as well as to provide a true proof-of-concept machine. Patents applied for. Contact me for licensing information.
We improve synchronous digital system performance (ANY synchronous digital system) through the use of Timing Error Avoidance or TEAtime. The key idea is to use a "tracking logic" circuit that mimics the structure and wiring of the critical path in a synchronous system, and also has a small added delay. The clock is then continuously sped up until an error is detected in the tracking logic, but before any errors have occured in the real logic. The clock is then slowed down until the error disappears, and then the process repeats.
Thus,
TEAtime dynamically adjusts performance for
a system's present operating conditions and past manufacturing conditions.
The win is that the system is thus
operable using TYPICAL timing delays, typically about half of
worst-case delays, thereby often doubling the operating clock
frequency.
For example, it allows a system operated at room temperature to operate
at frequencies greater than its designed worst-case frequency.
It also speeds up the system clock when the ambient temperature
goes down. A prototype has been built and demonstrated. An earlier approach was the use of TIMing ERRor TOLeration or TIMERRTOL. The basic idea is to run a digital system (ANY synchronous digital system) faster and faster until a timing error is detected. At this time the computation is corrected, wasting a cycle or two, and then the computation continues. If the error rate gets so large as to reduce performance, the clock frequency is reduced. See our TIMERRTOL Tech. Report for more info. (Only the pipelined solution is truly of interest.) Patent applied for. Contact me for licensing information.
We take the opposite tack with the use of Underclocking. We reduce a microprocessor's operating frequency BELOW its specified value to achieve various combinations of high reliability, low power consumption, load adaptation and disaster tolerance. For example, our TEAPC prototype uses modern formal feedback control theory to operate a Pentium 4 3.0 GHz chip at frequencies as low as 1.1 GHz and at lower supply voltages; at that operating point TEAPC is still at least basically functional (Web surfing, PowerPoint presentations, Sandra, etc., still run) (but we have not conducted extensive validation studies), and saves about 30-40% of the power normally consumed. In the particular case of TEAPC operated at room temperature, the CPU fan need not be on at the low frequency and voltage operating point (a case of disaster tolerance). A standard OS, Windows 2000 SP4, is used.
TEAPC also realizes adaptive computing in general, subsuming some of
TEAtime's benefits. (In fact, TEAPC was motivated
by the desire to realize TEAtime in a real computer.)
We seek to change this through a "radical" approach: let the hardware
do the hard stuff, like parallelism detection and scheduling, and give
the programmer an easy programming model, e.g., vanilla C. Our model
machine is called
"Teradactyl",
and is composed of multiple
Levo - like processors arranged in a ring. It's
kind of like a macro-Levo formed of micro-Levos.
Various Research TopicsDisjoint Eager Execution (DEE) - This is an optimal form of speculative execution, in which execution resources (processing elements, etc.) are only assigned to the code whose results are most likely to be needed. DEE exhibits the best features of both Single Path (standard) speculative execution and Eager Execution (speculation down both paths of a branch). It was derived/invented/discovered by Dr. Uht in 1990 at UCSD, and developed and simulated at URI from 1992 on.- Animation - Note: takes a minute or two to start. - Simulation Data - as of December 1995. Minimal Control Dependencies (MCD) - This reduced form of control dependencies allows many instructions after a branch to be executed both in parallel and out-of-order with the branch. This includes other branches.
Other Information
Papers
URI Microarchitecture Research Institute (MuRI) - Our parent research organization.
High Performance
Computing Laboratory - An affiliated organization for computer
architecture research at URI.
[ Levo | TEAtime | Teradactyl | DEE Tutorial | MCD Tutorial | Papers | HPCL ]
October 8, 2004 | Gus Uht | uht@ele.uri.edu
|