"Work smarter, not harder" is a slogan often bandied about by business efficiency task forces. Everyone knows that a modicum of planning and organization can save a lot of wasted effort. Smart managers are constantly looking for ideas that will streamline their operations and improve the productivity of their workforces. Any edge, even one as small as one percent, is worth it.

So how about an idea that increases productivity by a thousand percent?

The "business" of scientific computing saw just such an idea back in 1965, stuffed in a suggestion box known as Mathematics of Computation, a journal published by the American Mathematical Society. In a five-page paper titled "An Algorithm for the Machine Calculation of Complex Fourier Series," James W. Cooley of the IBM T.J. Watson Research Center (now at the University of Rhode Island) and John W. Tukey of Princeton University and AT&T Bell Laboratories laid out a scheme that sped up one of the most common activities in scientific and engineering practice: the computation of Fourier transforms.

Their algorithm, which soon came to be called the fast Fourier transform--FFT for short--is widely credited with making many innovations in modern technology feasible. Its impact extends from biomedical engineering to the design of aerodynamically efficient aircraft. Over the last two and a half decades, Cooley and Tukey's paper has been cited in well over a thousand articles in journals ranging from Geophysics to Applied Spectroscopy. According to Gil Strang of MIT, the FFT is "the most valuable numerical algorithm in our lifetime."

Curiously, "An Algorithm for the Machine Calculation of Complex Fourier Series" came close to never being published. But more about that later. First, why is the FFT such an important algorithm?

To understand why the fast Fourier transform has had such a profound impact on technology requires some appreciation of the Fourier transform itself. Whole books have been written on the subject of Fourier analysis and its applications. Their common theme is the incredible versatility of a simple mathematical tool.

The Fourier transform stands at the center of signal processing, which encompasses everything from satellite communications to medical imaging, from acoustics to spectroscopy. Fourier analysis, in the guise of X-ray crystallography, was essential to Watson and Crick's discovery of the double helix, and it continues to be important for the study of protein and viral structures. The Fourier transform is a fundamental tool, both theoretically and computationally, in the solution of partial differential equations. As such, it's at the heart of mathematical physics, from Fourier's analytic theory of heat to the most modern treatments of quantum mechanics. Any kind of wave phenomenon, be it seismic, tidal, or electromagnetic, is a candidate for Fourier analysis. Many statistical processes, such as the removal of "noise" from data and computing correlations, are also based on working with Fourier transforms.

Fourier analysis had been around for 150 years when Cooley and Tukey's paper appeared. The theory and many of the applications had been highly developed. But the relatively primitive state of the computational aspects of Fourier analysis limited what could be done in practice. The fast Fourier transform changed all that.

Partly at fault for the primitive state of Fourier computation was the utter conceptual simplicity of the transform itself. Stripped to its mathematical essentials, the digital process of computing a Fourier transform boils down to multiplying a vector of data by a matrix consisting of roots of unity. If there are $N$ data points, then the entry in the hth row and kth column of the $N x N$ "Fourier matrix" is $e^{2{\pi}ihk/N}$. One is hard pressed to think of a more elegant or more straightforward mathematical idea.

The problem is, every entry in the Fourier matrix is non-zero, which means that, in practice, the conceptually simple task of multiplying a matrix by a vector becomes an onerous chore involving a total of $N^2$ multiplications. If $N$ is small, the prospect is not so bad. But when you're trying to work with large data sets, the $N^2$ slowdown becomes a computational bottleneck. Even at $N = 1000$, the amount of computation--$N^2 = 1,000,000$ multiplications--becomes a strain.

The fast Fourier transform is a simple and elegant way of streamlining this laborious computation. As is often the case with simple and elegant methods, many of the ideas underlying the fast Fourier transform had been anticipated by others, including, in this case, Gauss (back in 1805--even before Fourier), but it was Cooley and Tukey who turned the technique into a staple of scientific computing.

In essence, they realized (as others had before them) that the straightforward approach to the Fourier transform had the computer doing the exact same multiplications over and over--a process that can be likened to a store clerk making a separate trip to the stockroom for each and every item in an order. By organizing the computation in an efficient manner that took advantage of the algebraic properties of the Fourier matrix, Cooley and Tukey found that they could eliminate almost all of these redundant calculations. The saving achieved by their algorithm is staggering.

Mathematically, the fast Fourier transform is based on a factorization of the Fourier matrix into a collection of sparse matrices--matrices in which most of the entries are equal to zero. The process is easiest to explain, and implement, when $N$ is a power of 2, such as 1024. It begins with a factorization that reduces the Fourier matrix of size $N$ to two copies of the Fourier matrix of size $N/2$. This reduction continues for lg $N$ steps ("lg" standing for the base-2 logarithm), until the original matrix is written as a product of 2lg $N$ sparse matrices, half of which can be collapsed to a single permutation matrix. (The diligent reader is invited to try his or her hand at working out the details.)

The upshot of all this is that a computation that originally required $N^2$ multiplications can now be done with only $N$lg $N$ multiplications. (The number of additions is likewise much smaller.) For $N = 1024$, that's a hundredfold less computation. Moreover, the speedup factor actually gets better as the problem gets larger! (This is one of the ways in which mathematical improvements can outpace other approaches to efficiency. For example, building a new computer that runs a hundred times faster than the one it replaces is fine, but then all you've got is a hundredfold speedup. Faster algorithms, on the other hand, often post ever bigger gains on ever bigger problems.

Sometimes theoretic improvements in the efficiency of an algorithm remain at the level of theory, but that hasn't been the case with the fast Fourier transform. Every computer that does numerical computation on a large scale runs some version of the Cooley-Tukey algorithm. The fast Fourier transform was seized upon by researchers interested in everything from signal detection in radar systems to the assessment of heart valve damage by biomedical instrumentation. One measure of the impact of Cooley and Tukey's algorithm can be found in the Scientific Citation Index: Their 1965 paper is still cited anywhere from 50 to 100 times each year.

What all the applications have in common is huge, sometimes mind-boggling amounts of data. Moreover, for many applications, the data must be dealt with in something close to real time. It doesn't do much good to detect an incoming missile if the signal is processed after the warhead detonates, nor is a cardiologist's report that arrives on the day of the wake of any avail. (Even more Fourier-intensive are such imaging techniques as CAT scans and MRI. Researchers have also applied Fourier analysis in devices that monitor blood flow by means of an ultrasonic Doppler shift, sort of like a vascular radar gun.)

"Everywhere the Fourier transform is used, the fast Fourier transform can do better," says Ingrid Daubechies, an expert on the mathematics of signal processing at AT&T Bell Laboratories and Rutgers University. (Daubechies's specialty, wavelets, may take over part of the burden now shouldered almost exclusively by the fast Fourier transform, but it won't supplant it entirely. In some regards, the potential importance of wavelet theory is built on the success of the FFT.) "Computations that would have taken years can be done in seconds" with the fast Fourier transform, she explains.

The FFT was put right to work by Lee Alsop, a geophysicist at IBM and Columbia University. Alsop analyzed the seismographic record of a 1965 earthquake in Rat Island, Alaska, using 2048 data points representing a 13.5-hour period. A conventional Fourier transform took more than 26 minutes to do the analysis; the FFT spit out the answer in 2.4 seconds. Moreover, by running tests with numerically generated data, Alsop found that the fast Fourier transform was not only faster, but also more accurate than conventional methods.

A 1966 paper, "Fast Fourier Transforms for Fun and Profit," by Morven Gentleman and Gordon Sande, put Alsop's observation on a firm footing. Gentleman and Sande showed that, on average, the fast Fourier transform error was lower by a factor of $N/$lg $N$, which was also the speedup factor. Their paper explored many other aspects of the FFT as well, including the computation of convolutions, which is at the heart of digital signal processing.

Cooley takes pains to praise the Gentleman-Sande paper, as well as an earlier paper by Sande (who was a student of Tukey's) that was never published. In fact, Cooley says, the Cooley-Tukey algorithm could well have been known as the Sande-Tukey algorithm were it not for the "accident" that led to the publication of the now-famous 1965 paper. As he recounts it, the paper he co-authored with Tukey came to be written mainly because a mathematically inclined patent attorney happened to attend the seminar in which Cooley described the algorithm. The lawyer, Frank Thomas, "saw that it was a patentable idea," Cooley explains, "and the policy of IBM and their lawyers was to be sure that nobody bottled up software and algorithms by getting patents on them." A decision was quickly reached to put the fast Fourier transform in the public domain, and that meant, in part, publishing a paper.

The publication of the Cooley-Tukey algorithm brought on an explosion of activity, as researchers from all branches of science rushed to apply the new technique to their own computational problems. One of the most spectacular applications--at the time--was an FFT computation of a sequence of 512,000 data points taken from interferometer measurements. This computation, done using a program designed by Norman Brenner at MIT, enabled Janine Connes, an astronomer at the University of Paris, to calculate the infrared spectra of the planets for a book that has become a standard reference in the subject.

The main problem with Connes's computation was that the 512,000 data points exceeded what was then available in high-speed memory. Brenner's solution was to "schedule" the fast Fourier transform so that data could be moved in and out of auxiliary storage without paying too great a penalty. This technique, modified to new hardware and new computer architectures, is still very much in use.

The applications of the fast Fourier transform are continuing to expand, as computers themselves become more powerful and as researchers continue to refine the algorithm for particular problems and adapt it to new machine architectures. There are even hints of faster Fourier transforms to come. Steven Orszag of Princeton University and Eytan Barouch of Clarkson University, for example, are putting some new twists into the notion of the FFT in order to carry out the particularly large computations involved in large-scale simulations of computer chips. It would only be fitting if a fast Fourier transform designed its own efficient circuitry.

In the meantime, the fast Fourier transform is rapidly becoming part and parcel of the engineering curriculum. Textbooks on the subject are now appearing, and the technique is even being taught at the undergraduate level. Who knows? It could even show up someday in the business school syllabus. At the very least, it's a good model for what can be achieved with a little bit of organized effort. It's easy to find algorithms that work harder than the fast Fourier transform; it's hard to find any that work smarter.

Barry A. Cipra is a mathematician and writer based in Northfield, Minnesota.

Volume 26-3, May 1993

(C) 1993 by Society for Industrial and Applied Mathematics

All rights reserved.