Software Algorithms Add Fuel to Processor Speed

August 1999
By Fred V. Reed

The next jump in processor capabilities may come from number crunching rather than from micron splitting.

A new twist on age-old logarithms may hold the key to faster computer microprocessors. An improved approach to logarithmic arithmetic is finally allowing it to compete with current algorithms used in the central processing units of computers.

Previous processor speeds largely owed their advances to new chip designs. Engineers constantly design chip architectures to permit operation at higher speeds. Similar advances have been achieved by shrinking the semiconductor’s detail resolution progressively to below 1 micron.

Now, the focus again is on algorithms. The new logarithmic approach, known as exponential floating point (EFP), or sometimes called logarithmic number system (LNS), offers the potential of speeding up existing chips or allowing designers to plan more efficient processors in the future.

The consequence, according to Les Pickett, chief technical officer of Log Point Technologies, Mountain View, California, which holds patents on the technology, should be an increase in computational speed perhaps reaching a factor of 100, with the silicon area diminishing by perhaps 90 percent.

“I think EFP arithmetic will definitely become standard for floating-point computation using a data format of up to about 40 bits, which includes the lion’s share of embedded and graphics computation,” says Pickett. Logarithms have been well known for centuries, and numerous papers exist in the mathematical literature dealing with their possible use in computers. Their adoption has been prevented by problems with logarithms, notably their inefficiency in addition and subtraction. Log Point representatives say the company has solved these problems.

Current central processing units (CPUs) use floating-point algorithms. Numbers are stored in a sort of binary equivalent of scientific notation, with a significand and an exponent. For example, to simplify slightly, 2,340 would be stored as two values—2.34, the significand; and 3, the exponent of 10—except in binary rather than decimal. Multiplication is done by adding the exponents to get the exponent of the answer. The significand is obtained by shifting and adding, much as in ordinary long multiplication taught in grade school.

This works and can be speeded by hardware. The complexity of the associated hardware and the relatively low speed of execution make it less than ideal for multiplication-intensive applications. Addition and subtraction, by contrast, are fast in conventional floating-point operations.

However, when numbers are stored in logarithmic form, multiplication and division are very fast because their function is reduced to the addition of logarithms. Furthermore, accuracy improves because answers are exact, while current methods can give a round-off error of a half bit.

This accelerated multiplication with logs is particularly important because many vital applications today rely heavily on multiplication, for example three-dimensional graphics, digital signal processing and fast Fourier transforms.

Pickett explains, “In an EFP system, the floating-point multiplication or division is affected simply as a corresponding addition or subtraction of exponents because each significand possesses an implicit value of one, with any product or ratio of such signficands also being implicitly one. These efficient multiplication and division operations, however, require both operands to be already in EFP form, which requires generation of a logarithmic function of the conventional significand. Similarly, eventual generation of a corresponding exponential (for example, antilogarithmic) function is generally required following a sequence of EFP operations.”

The problem with logarithmic arithmetic is that addition and subtraction become difficult, involving complex and nonlinear functions that cannot be calculated quickly. The focus of research has therefore been almost entirely on finding ways of making logarithmic addition and subtraction competitive and in speed with the current nonlogarithmic methods.

A suggested solution has been to use look-up tables that contain values already calculated and stored. In the past, the size of these tables has been so great as to prevent their use in practical machines. Further, the size of the tables grows exponentially with the length of the numbers represented. This has led to schemes for using interpolation between values stored in tables. Unfortunately, interpolation takes time, which at least partly offsets the advantages of using logs in the first place. It can also introduce inaccuracies.

Pickett says Log Point has overcome the problem. “Our Soft CoProcessor libraries for 32-bit microprocessors provide a 32-bit floating-point capability that is numerically equivalent to the Institute of Electrical and Electronics Engineers’ (IEEE) 754 standard floating point. The core tables comprise 30 to 50 kilobytes.” For 8-bit applications, the tables are far smaller, he says, and handle logs, exponents and general powers.

While the high-end microprocessors used in personal computers and workstations get more publicity, the overwhelming majority of processors are manufactured for embedded applications to run things from refrigerators to torpedoes. Embedded processors run the gamut from very simple 8-bit chips to sophisticated 64-bit machines. Some of these have to handle large amounts of input data quickly.

“Consequently,” Pickett says, “the conversion of fixed-point data, say from sensors into log format, requires a conversion from fixed point and then by an antilog operation back into fixed point. These conversions are a major portion of many commercial applications.” According to Pickett, Log Point’s algorithms are the fastest currently available.

The mathematics can get involved, and questions of how fast what can do what and exactly how depend greatly on whether the operations are being done in software, hardware, read-only memory (ROM) or random access memory (RAM), to what precision, and with what mix of operations. Some of Log Point’s reported efficiencies are eye-catching, and skepticism exists in some quarters.

As an example of a particular instance of generating table functions, Pickett says, “As commonly implemented, such function generation often takes 50 to 100-plus cycles if a hardware multiplier is used, or 300 to 1,000-plus cycles if not. Log Point software, using a total of 5 to 10 percent of a single megabyte of tables, can execute either of the required functions in seven cycles on a low-cost microprocessor. There is a long list of very well-informed people who believe this is impossible.”

Pickett says the company’s Soft CoProcessor can be placed in either ROM or RAM, and the Soft CoProcessor’s mathematical libraries are available as C++ class libraries that can be incorporated into existing C++ applications in a few minutes.

“This C++ code provides about 2.5 to 3.5 times speedup in linear applications—about equal quantities of add and multiply operations and little else—when compared with the better competing math libraries,” Pickett states. “In nonlinear applications relying more heavily on multiply, divide, square root, log, exponential and other scientific functions, speedups of five to 10 times are common. However, Log Point’s function generators can often substantially extend these speedups at a cost of a few kilobytes of look-up tables and contracted construction for specialized computations.”

The technology already is emerging from the laboratory into the marketplace. Dreamscape Net, a market research and consulting firm, is representing Log Point in Asian markets. Dreamscape officials state that their company has evaluated the technology over the past six months, and they believe it is “a perfect fit” for many major Japanese and other Asian high-technology companies.

Two areas that may benefit soon from the technology are video games and set-top boxes, both of which require tremendous speed, enhanced effectiveness and reduced silicon area. Next, Dreamscape officials foresee the emphasis shifting to personal computers and broadband Internet applications as well as cellular telephony.


Multiplication Through Addition

Logarithms, codified by the Scottish mathematician Napier in the 17th century, have been used in mathematics for centuries. The logarithm of a number in base 10 is simply the exponent to which 10 must be raised to give the original number. For example, the base-10 logarithm of 100 is 2; 10 must be raised to the second power to give 100. The log of 1,000 is 3. To multiply 100 by 1,000, one simply adds the logarithms 2+3=5 to get the logarithm of the product.

All positive numbers greater than zero have logs that can easily be generated by algorithms commonly used in calculators, for example. Thus, log 698 is 2.84, and log 75 is 1.87. The log of the product is thus 2.84 plus 1.88, or 4.71, and taking the antilog, the product is 51,286. (The error is due to rounding of the logs.)

Binary logs work the same way, except that the base is 2. For example, the binary log of 8 is 3, because 2 raised to the third power—2 times 2 times 2—is 8.

Logs are the principle behind the slide rule, long the engineer’s chief tool for calculation until the advent of the pocket calculator. The position of each number on the slide rule is proportional to the logarithm of that number. What a slide rule actually does is to add the lengths, and thus in effect the logarithms, to give the product.