FFT Explained: Why the Fast Fourier Transform Changed the World

The Cooley-Tukey FFT algorithm reduces the Discrete Fourier Transform from O(N²) multiplications to O(N log N) — a speedup that made real-time audio processing, JPEG compression, wireless modulation, and radar signal analysis computationally feasible. The algorithm is 60 years old, but understanding why it works is still one of the most elegant insights in computer science.

1. The Discrete Fourier Transform

The DFT converts a sequence of N time-domain samples into N frequency-domain coefficients. Each output X[k] measures how much of the frequency f_k = k/N (in cycles per sample) is present in the input.

DFT definition: X[k] = Σ_{n=0}^{N−1} x[n] · e^{−j 2π k n / N} k = 0, 1, ..., N−1 Inverse DFT: x[n] = (1/N) · Σ_{k=0}^{N−1} X[k] · e^{+j 2π k n / N} Complex exponential: e^{−j θ} = cos θ − j sin θ X[k] interpreted: amplitude |X[k]| at frequency k/N cycles/sample phase arg(X[k]) = phase shift of that frequency component

The DFT is used everywhere: audio spectrum analysis, noise filtering, image compression (DCT), convolution, correlation, and modulation schemes. The problem is computing it fast.

2. Why the DFT is O(N²)

To compute each X[k], you sum N terms. There are N values of k. So the total number of complex multiplications is N × N = N².

Naive DFT

O(N²)

N = 1024 → ~1M operations

Cooley-Tukey FFT

O(N log N)

N = 1024 → ~10K operations

N (samples)	DFT ops (N²)	FFT ops (N log₂N)	Speedup
64	4 096	384	~10×
1 024	1 048 576	10 240	~100×
65 536	4 294 967 296	1 048 576	~4 000×
1 048 576 (1M)	10¹²	20 971 520	~50 000×

Historical impact: Before the FFT, a 1024-point DFT on 1960s hardware took several minutes. After Cooley and Tukey's 1965 paper, it took seconds. This single algorithmic improvement enabled seismic analysis, television image processing, and ultimately all modern digital communications.

3. Divide and Conquer: The Even/Odd Split

The key insight (Cooley-Tukey, 1965; discovered earlier by Gauss in ~1805): split the N-point DFT into two N/2-point DFTs — one over the even-indexed samples, one over the odd-indexed.

X[k] = Σ_{n=0}^{N−1} x[n] · W_N^{kn} where W_N = e^{−j2π/N} (twiddle factor) Split into even (n = 2m) and odd (n = 2m+1) indices: X[k] = Σ_{m=0}^{N/2−1} x[2m] · W_N^{2km} + Σ_{m=0}^{N/2−1} x[2m+1] · W_N^{k(2m+1)} = Σ_{m=0}^{N/2−1} x[2m] · W_{N/2}^{km} + W_N^k · Σ_{m=0}^{N/2−1} x[2m+1] · W_{N/2}^{km} = E[k] + W_N^k · O[k] where E[k] = DFT of even samples (N/2-point DFT) O[k] = DFT of odd samples (N/2-point DFT) using W_N² = W_{N/2}

This halves the problem size. Because E[k] and O[k] are periodic with period N/2, we only need to compute them once and reuse for k and k + N/2:

Butterfly update: X[k] = E[k] + W_N^k · O[k] X[k + N/2] = E[k] − W_N^k · O[k] (because W_N^{k+N/2} = −W_N^k by periodicity) Recursion: T(N) = 2·T(N/2) + O(N) Solves to: T(N) = O(N log N)

4. The Butterfly Diagram

The butterfly is the fundamental unit of the FFT — two inputs produce two outputs in one step. In an N=8 FFT, three stages of N/2 = 4 butterflies compute the full transform.

Each "X" crossing is a butterfly operation. In an N-point FFT, there are log₂N stages, each with N/2 butterflies — giving N/2 × log₂N total butterfly operations.

5. Twiddle Factors and Periodicity

The term W_N^k = e^{−j2πk/N} is called the twiddle factor (or rotation factor). The key symmetry that makes FFT efficient:

Periodicity: W_N^{k+N} = W_N^k (rotates full circle → same value) Conjugate sym: W_N^{k+N/2} = −W_N^k (180° rotation → negation) Reduction: W_N^{2k} = W_{N/2}^k (half-size twiddle) First 4 twiddles (N=8): W₈⁰ = 1 (no rotation) W₈¹ = (√2/2)(1−j) ≈ 0.707 − 0.707j (−45°) W₈² = −j = (0−1j) (−90°) W₈³ ≈ −0.707 − 0.707j (−135°) The −W_N^k = W_N^{k+N/2} symmetry means each butterfly reuses the same twiddle for two outputs — halving the number of multiplications again.

Pre-computing twiddles: In an optimised FFT, all twiddle factors are computed once and stored in a lookup table. The main loop only moves data around (butterfly add/subtract) plus one complex multiply per butterfly. The multiply is 4 real multiplications and 2 additions (can be reduced to 3 muls with a Karatsuba trick).

6. Bit-Reversal Permutation

The Cooley-Tukey algorithm naturally produces output in a scrambled order. For in-place computation, the input must first be reordered using bit-reversal permutation: the index n is replaced by its binary digits reversed.

Bit-reversal for N=8 (3 bits): Index Binary Reversed New index 0 000 000 0 1 001 100 4 2 010 010 2 3 011 110 6 4 100 001 1 5 101 101 5 6 110 011 3 7 111 111 7 Swap: (0\leftrightarrow0), (1\leftrightarrow4), (2\leftrightarrow2), (3\leftrightarrow6), (4\leftrightarrow1)*, (5\leftrightarrow5), (6\leftrightarrow3)*, (7\leftrightarrow7) (* already handled by the reverse swap — only swap when rev > i to avoid double-swap)

7. JavaScript FFT Implementation

A clean iterative (in-place) Cooley-Tukey radix-2 FFT in JavaScript:

// FFT of size N (must be power of 2)
// Input: re[] and im[] arrays (real/imaginary parts)
// Output: re[] and im[] modified in-place
function fft(re, im) {
  const N = re.length;

  // 1. Bit-reversal permutation
  let j = 0;
  for (let i = 1; i < N; i++) {
    let bit = N >> 1;
    for (; j & bit; bit >>= 1) j ^= bit;
    j ^= bit;
    if (i < j) {
      [re[i], re[j]] = [re[j], re[i]];
      [im[i], im[j]] = [im[j], im[i]];
    }
  }

  // 2. Butterfly stages
  for (let len = 2; len <= N; len <<= 1) {
    const halfLen = len >> 1;
    const ang = -2 * Math.PI / len;     // remove minus sign for inverse FFT
    const wRe = Math.cos(ang);
    const wIm = Math.sin(ang);

    for (let i = 0; i < N; i += len) {
      let curRe = 1, curIm = 0;         // starts at W^0 = 1

      for (let k = 0; k < halfLen; k++) {
        const uRe = re[i + k];
        const uIm = im[i + k];
        const vRe = re[i + k + halfLen] * curRe - im[i + k + halfLen] * curIm;
        const vIm = re[i + k + halfLen] * curIm + im[i + k + halfLen] * curRe;

        // Butterfly: u + W·v  and  u - W·v
        re[i + k]          = uRe + vRe;
        im[i + k]          = uIm + vIm;
        re[i + k + halfLen] = uRe - vRe;
        im[i + k + halfLen] = uIm - vIm;

        // Rotate twiddle factor by W_len
        const newRe = curRe * wRe - curIm * wIm;
        curIm       = curRe * wIm + curIm * wRe;
        curRe       = newRe;
      }
    }
  }
}

// Usage: compute spectrum of a real signal
const N = 1024;
const re = new Float64Array(N);
const im = new Float64Array(N);  // zero for real input
// ... fill re[] with samples ...
fft(re, im);
// re[k], im[k] = complex spectrum at frequency k/N (cycles/sample)
// |X[k]| = Math.hypot(re[k], im[k])
// phase   = Math.atan2(im[k], re[k])

Performance note: For audio (N=2048 at 44.1 kHz), this processes a frame in <1 ms in modern JS. For N=1M samples, use typed arrays (Float32Array) and WASM for best performance. WebAudio API's AnalyserNode wraps a native FFT — use that for real-time audio visualisers.

8. Applications: Audio, Images, OFDM

Audio Spectrum and Filtering

Audio processing uses FFT continuously: take N samples (window), FFT → modify frequency bins (equaliser, noise cancellation, pitch detection) → IFFT → overlap-add to reconstruct. Real-time audio at 44.1 kHz uses N = 512–4096 point FFTs refreshed 10–100 times per second.

Fast Convolution O(N log N)

The convolution theorem: pointwise multiplication in frequency domain = convolution in time domain. Convolving two N-point signals directly: O(N²). Via FFT → multiply → IFFT: O(N log N). Used in: audio reverb (convolution reverb with impulse responses), image blurring, polynomial multiplication.

Convolution theorem: (x * h)[n] = IFFT{ FFT{x} \cdot FFT{h} } Complexity: 3 \times O(N log N) vs O(N²) direct Crossover: FFT faster when N > ~32 samples

JPEG: The 2D DCT

JPEG compression uses the 2D Discrete Cosine Transform (DCT-II) on 8×8 blocks. The DCT is a real-valued specialisation of the DFT (only cosine basis, no sine). Most energy concentrates in the low-frequency DCT coefficients → quantise high-freq coefficients aggressively. The 8×8 2D-DCT is computed via two 1D DCTs (row then column) — each O(8 log 8) ≈ O(24).

OFDM Wireless Modulation

OFDM (Orthogonal Frequency-Division Multiplexing) — used in Wi-Fi, 4G/5G, DAB radio — encodes data on N orthogonal subcarriers. The IFFT converts frequency-domain symbols (QAM) into a time-domain waveform for transmission. The receiver FFTs the received signal back to frequency domain. N = 64 (Wi-Fi 802.11a/g), 2048 (LTE), 4096 (5G NR).

Application	Typical N	Domain
Audio spectrum display	512–4096	1D FFT, real input
MP3 / AAC compression	512–2048 (MDCT)	Modified DCT
JPEG compression	8×8 blocks (DCT)	2D DCT
MRI image reconstruction	256²–1024²	2D FFT (k-space)
Radar pulse compression	512–65536	1D FFT matched filter
5G NR OFDM	128–4096	IFFT for modulation
Polynomial multiplication	Any power of 2	1D FFT (NTT for exact)