Research of A.K. Uht

Project Areas

Instruction Level Parallelism (ILP):

In this area I am mainly concerned with the improvement of uniprocessor performance on branch-intensive (general purpose) code through microarchitectural enhancements, especially the exploitation of large amounts of Instruction-Level Parallelism (the concurrent execution of machine instructions). The work is applicable to all processors, including Intel x86 architectures. Two key enabling ideas are Disjoint Eager Execution and Minimal Control Dependencies. Used together, they exhibit ILP's of 10's of instructions per cycle. They may also be used to realize high IPC's. (See below.)

- [main paper] - "Levo" is our prototype high-IPC uniprocessor computer. It uses a Resource Flow Execution model and realizes Disjoint Eager Execution and Minimal Control Dependencies, and will use somewhere around 32-64 processing elements to achieve the execution of 10's of instructions per cycle (target). In recent simulations, a Levo model with realistic resources, including a high-latency main memory, exhibited an harmonic mean IPC of about 5-6 on 10 SPECint benchmarks, including gcc. This was joint work with Prof. Kaeli's NUCAR group at Northeastern University.

We are currently designing a flexible prototype for both future architectural studies as well as to provide a true proof-of-concept machine. Patents applied for. Contact me for licensing information.

Adaptive Computing:

We improve synchronous digital system performance (ANY synchronous digital system) through the use of Timing Error Avoidance or TEAtime. The key idea is to use a "tracking logic" circuit that mimics the structure and wiring of the critical path in a synchronous system, and also has a small added delay. The clock is then continuously sped up until an error is detected in the tracking logic, but before any errors have occured in the real logic. The clock is then slowed down until the error disappears, and then the process repeats.

Thus, TEAtime dynamically adjusts performance for a system's present operating conditions and past manufacturing conditions. The win is that the system is thus operable using TYPICAL timing delays, typically about half of worst-case delays, thereby often doubling the operating clock frequency. For example, it allows a system operated at room temperature to operate at frequencies greater than its designed worst-case frequency. It also speeds up the system clock when the ambient temperature goes down. A prototype has been built and demonstrated.
[TEAtime Protoype Info (42MB)] / [readme (2KB)]
Patent applied for. Contact me for licensing information.

An earlier approach was the use of TIMing ERRor TOLeration or TIMERRTOL. The basic idea is to run a digital system (ANY synchronous digital system) faster and faster until a timing error is detected. At this time the computation is corrected, wasting a cycle or two, and then the computation continues. If the error rate gets so large as to reduce performance, the clock frequency is reduced. See our TIMERRTOL Tech. Report for more info. (Only the pipelined solution is truly of interest.) Patent applied for. Contact me for licensing information.

Underclocking:

A common pastime of computer enthusiasts is overclocking, in which PC-based microprocessors are operated at frequencies above their nominal ratings in order to improve PC performance. This is often accomplished with the help of the very dangerous practice of increasing the supply voltages of both the CPU and the memory.

We take the opposite tack with the use of Underclocking. We reduce a microprocessor's operating frequency BELOW its specified value to achieve various combinations of high reliability, low power consumption, load adaptation and disaster tolerance. For example, our TEAPC prototype uses modern formal feedback control theory to operate a Pentium 4 3.0 GHz chip at frequencies as low as 1.1 GHz and at lower supply voltages; at that operating point TEAPC is still at least basically functional (Web surfing, PowerPoint presentations, Sandra, etc., still run) (but we have not conducted extensive validation studies), and saves about 30-40% of the power normally consumed. In the particular case of TEAPC operated at room temperature, the CPU fan need not be on at the low frequency and voltage operating point (a case of disaster tolerance). A standard OS, Windows 2000 SP4, is used.

TEAPC also realizes adaptive computing in general, subsuming some of TEAtime's benefits. (In fact, TEAPC was motivated by the desire to realize TEAtime in a real computer.)

Supercomputing:

The main problem facing supercomputing today is not the building of supercomputers per se, but rather their programming. It frequently takes months or years to program a current generation supercomputer, even with sophisticated compilers and software libraries. And at the end of that time, the supercomputer is out-of-date, and the program must be ported to a new machine, and so on.

We seek to change this through a "radical" approach: let the hardware do the hard stuff, like parallelism detection and scheduling, and give the programmer an easy programming model, e.g., vanilla C. Our model machine is called "Teradactyl", and is composed of multiple Levo - like processors arranged in a ring. It's kind of like a macro-Levo formed of micro-Levos.

Various Research Topics

Disjoint Eager Execution (DEE) - This is an optimal form of speculative execution, in which execution resources (processing elements, etc.) are only assigned to the code whose results are most likely to be needed. DEE exhibits the best features of both Single Path (standard) speculative execution and Eager Execution (speculation down both paths of a branch). It was derived/invented/discovered by Dr. Uht in 1990 at UCSD, and developed and simulated at URI from 1992 on.
- Animation - Note: takes a minute or two to start.
- Simulation Data - as of December 1995.

Minimal Control Dependencies (MCD) - This reduced form of control dependencies allows many instructions after a branch to be executed both in parallel and out-of-order with the branch. This includes other branches.

Other Information

Papers

URI Microarchitecture Research Institute (MuRI) - Our parent research organization.

High Performance Computing Laboratory - An affiliated organization for computer architecture research at URI.

Financial Support from (present and past):

National Science Foundation

No. EIA-9729839. CISE Research Instrumentation: Equipment for Diverse Computer Architecture Research

No. CCR-9708183. Order of Magnitude Instruction Level Parallelism

The Intel Corporation

Mentor Graphics Corporation's Higher Education Program

Xilinx Corporation

URI Office of the Provost

URI Research Office

Software or Hardware Donations from:

Mentor Graphics Corporation's Higher Education Program

Xilinx Corporation

Virtual Computer Corporation

October 8, 2004 | Gus Uht | uht@ele.uri.edu