Linear Algebra for Simulations: Sparse Solvers, Decompositions & Eigenvalues

1. Why Linear Systems Dominate Simulation

Physical laws — conservation of mass, momentum, and energy — become linear systems when discretised in space and time. Three archetypal examples:

FEM: Global Stiffness Matrix

The finite-element method for structural mechanics produces:

K \cdot u = f K \in ℝ^(N\timesN) : global stiffness matrix (sparse, symmetric positive definite) u \in ℝ^N : displacement vector (unknown) f \in ℝ^N : external force vector

N is the number of degrees of freedom (3 per node in 3D). For a mesh with 100,000 nodes, N ≈ 300,000. K has bandwidth determined by mesh connectivity — roughly 20–30 non-zeros per row — yielding about 6 million nonzero entries out of 90 billion total: a sparsity of 99.993%.

Fluid: Pressure Poisson Equation

In incompressible Navier–Stokes simulations (e.g. FLIP fluid, SPH), enforcing ∇·u = 0 requires solving:

\nabla²p = (ρ/Δt) \cdot \nabla\cdotu* (Poisson equation for pressure) // Discretised on a grid with 5-point stencil (2D) or 7-point (3D) A \cdot p = b where A is the discrete Laplacian matrix

The discrete Laplacian on a 256³ grid has N ≈ 16.7 million unknowns. Direct factorisation would require O(N³) ≈ 4.6 × 10²¹ operations — utterly infeasible. This is why iterative solvers exist.

Common Properties

Sparsity: O(N) nonzeros out of N² possible entries
Symmetry: For undirected graphs or self-adjoint operators, A = Aᵀ
Positive definiteness (SPD): xᵀAx > 0 for all x ≠ 0 — enables special algorithms
Structured sparsity: FEM matrices often have block structure following mesh topology

2. Sparse Matrix Formats: CSR, CSC, COO

Sparse matrix formats store only nonzero values and their positions, trading memory for the overhead of index arrays.

COO — Coordinate Format

Three arrays: row indices, column indices, values. Unordered, allows duplicate (row, col) entries (summed on conversion). Best for matrix assembly.

Matrix: [ 5 0 0 ] COO format: [ 0 8 2 ] row = [0, 1, 1, 2, 2] [ 0 3 7 ] col = [0, 1, 2, 1, 2] val = [5, 8, 2, 3, 7]

CSR — Compressed Sparse Row

The most common format for arithmetic. Replace row array with row pointer (length N+1 where rowptr[i+1] − rowptr[i] is the number of nonzeros in row i):

rowptr = [0, 1, 3, 5] // row 0 has 1 nz, row 1 has 2, row 2 has 2 col = [0, 1, 2, 1, 2] val = [5, 8, 2, 3, 7] // Matrix-vector multiply (CSR): for i in 0..N: y[i] = 0 for j in rowptr[i]..rowptr[i+1]: y[i] += val[j] * x[col[j]]

CSC — Compressed Sparse Column

Row and column roles swapped. Preferred by LAPACK convention and for column-oriented operations (e.g. factorisation pivoting). MATLAB's native format is CSC.

Format Comparison

Format	Memory	Matvec	Assembly	Best for
COO	3×nnz	Slow (cache miss)	Excellent	Building from mesh
CSR	2×nnz + N+1	Fast (row cache-local)	Poor	CG, GMRES matvec
CSC	2×nnz + N+1	Fast (col access)	Poor	LU factorisation
BSR	2×nnz/bs + N/bs+1	Fastest (SIMD blocks)	Moderate	FEM block DOFs

3. Direct Solvers: LU, Cholesky, QR

Direct methods factorise A once and then solve via triangular back-substitution in O(N) operations — but the factorisation itself can be expensive and fill-in (creation of nonzeros where A was zero) limits scalability to N ≲ 10⁵ in 3D.

LU Decomposition

For a general n×n matrix: A = P · L · U where P is a permutation matrix (partial pivoting), L is lower triangular with unit diagonal, U is upper triangular.

// Solve Ax = b: PA = LU \to LUx = Pb 1. Forward sub: Ly = Pb O(n²) 2. Back sub: Ux = y O(n²) Factorisation cost: O(n³/3) flops (dense) Sparse fill-in: depends on elimination ordering (AMD, nested dissection)

Sparse LU via UMFPACK or SuperLU uses fill-reducing orderings. The nested dissection ordering gives O(N^1.5) fill-in for 2D grids and O(N^2) for 3D — much better than natural ordering but still worse than iterative solvers for very large 3D problems.

Cholesky Decomposition (SPD)

For symmetric positive definite matrices: A = LLᵀ where L is lower triangular with positive diagonal. No pivoting needed (stability is guaranteed for SPD). Roughly 2× faster than LU — only the lower triangle is stored and processed.

L[i][i] = sqrt(A[i][i] - sum(L[i][k]^2 for k < i)) L[j][i] = (A[j][i] - sum(L[j][k]*L[i][k] for k < i)) / L[i][i] (j > i)

Applications: FEM stiffness matrices, covariance matrices in probabilistic methods, SPD preconditioner factors (incomplete Cholesky IC(0)).

QR Decomposition

Factorises A = QR where Q is orthogonal (Qᵀ = Q⁻¹) and R is upper triangular. Use cases:

Least squares: min‖Ax − b‖₂ → Rx = Qᵀb (n columns, m rows, m > n)
Eigenvalues: QR iteration for dense matrices
Gram-Schmidt orthogonalisation: basis for Krylov subspace methods

Computed via Householder reflectors or Givens rotations. Householder QR is numerically superior to classical Gram-Schmidt — the orthogonality error is O(ε·κ(A)) vs O(ε·κ(A)²) for CGS.

4. Iterative Solvers: CG, GMRES, BiCGSTAB

Iterative methods start from an initial guess x₀ and improve it one "step" at a time. Each step costs O(nnz) for a sparse matrix-vector multiply. They never fully factorise A — making them applicable to problems where direct methods are infeasible.

Conjugate Gradient (CG) — SPD Only

The CG method (Hestenes & Stiefel, 1952) minimises the quadratic form f(x) = ½ xᵀAx − bᵀx over a sequence of A-conjugate search directions. For an SPD matrix, it converges in at most N steps in exact arithmetic (N = matrix size), but in practice converges to machine precision far sooner.

Algorithm CG: r₀ = b − A·x₀; p₀ = r₀; k = 0 while ‖rₖ‖ > tol: αₖ = (rₖᵀrₖ) / (pₖᵀApₖ) // step length x_{k+1} = xₖ + αₖ·pₖ r_{k+1} = rₖ − αₖ·A·pₖ βₖ = (r_{k+1}ᵀr_{k+1}) / (rₖᵀrₖ) // update direction p_{k+1} = r_{k+1} + βₖ·pₖ k = k + 1 Cost per iteration: 1 SpMV (A·pₖ) + 4 dot products + 3 axpy operations Convergence: ‖eₖ‖_A ≤ 2·((√κ−1)/(√κ+1))^k · ‖e₀‖_A where κ = λmax/λmin

GMRES — General Matrices

The Generalised Minimal Residual method (Saad & Schultz, 1986) handles non-symmetric and indefinite systems. It builds an orthonormal Krylov basis {v₁, v₂, ..., vₖ} via Arnoldi iteration and minimises ‖b − Axₖ‖₂ over the Krylov subspace Kₖ = span{v₁, ..., vₖ}.

Krylov subspace: K_k(A, r₀) = span{r₀, Ar₀, A²r₀, ..., A^(k-1)r₀} Minimise: ‖b − A·xₖ‖₂ over x ∈ x₀ + K_k cost: k SpMVs + O(k²) Gram-Schmidt per cycle memory: O(k·N) — limits restart parameter k (typically k = 30–200)

Restarted GMRES(k) restarts the Krylov basis every k steps to control memory. The tradeoff: smaller k uses less memory but may stagnate on difficult systems. Flexible GMRES (FGMRES) allows the preconditioner to change between iterations.

BiCGSTAB

Biconjugate Gradient Stabilised (van der Vorst, 1992) is a variant for non-symmetric systems that avoids GMRES's memory growth. It uses two sequences of residuals to achieve smoother convergence than BiCG (which can show erratic behaviour). Memory: O(N) — only a fixed number of vectors regardless of iteration count. Preferred over GMRES when memory is tight.

5. Preconditioners: Jacobi, ILU, Multigrid

A preconditioner M approximates A and transforms the system to M⁻¹Ax = M⁻¹b. Convergence of CG/GMRES depends on the condition number κ(M⁻¹A) — a good preconditioner reduces κ at low cost.

Jacobi (Diagonal) Preconditioner

M = diag(A) \to M⁻¹x = x / diag(A) Cost: O(N) — just divide each component by the diagonal entry Effect: normalises rows; removes easily-corrected diagonal dominance Works well for: diagonally dominant systems, first pass on poorly scaled A

Incomplete LU: ILU(0) and ILU(k)

Incomplete LU factorises A ≈ L̃Ũ where L̃ and Ũ share the sparsity pattern of A (ILU(0)) or allow k levels of additional fill-in (ILU(k)). The approximate factors are cheap to apply (sparse triangular solves) and substantially reduce κ.

ILU(0): compute LU but drop any fill-in at positions where A_{ij} = 0 → L̃ and Ũ have the same sparsity pattern as the lower/upper triangles of A Apply M⁻¹r: forward solve with L̃, then back solve with Ũ

ILU(k) allows fill-in up to distance k from existing nonzeros. ILU(1) or ILU(2) often reduces iteration count by 5–10× vs. Jacobi at the cost of 3–5× more memory per factor.

Multigrid: Optimal O(N) Convergence

Multigrid preconditioners (or standalone solvers) achieve O(N) total work by solving the problem on a hierarchy of coarser grids and using coarse-grid solutions to correct fine-grid residuals.

Smoothing: a few Gauss-Seidel or Jacobi iterations damp high-frequency error components
Restriction: project residual to coarser grid (full weighting or injection)
Coarse solve: exact solve at coarsest level (trivially small)
Prolongation: interpolate correction back to fine grid
V-cycle: one down-pass + one up-pass; W-cycle: two recursive down-passes

Geometric multigrid requires a structured mesh hierarchy. Algebraic multigrid (AMG) constructs the hierarchy automatically from the matrix — HYPRE's BoomerAMG and ML/MueLu (Trilinos) are standard libraries in scientific computing. For Poisson equations on uniform grids, multigrid converges in O(1) iterations regardless of grid size.

Preconditioner	Setup cost	Apply cost	Typical κ reduction	Best for
None	O(1)	O(1)	1×	Well-conditioned A
Jacobi	O(N)	O(N)	2–5×	Diagonally dominant
SSOR	O(nnz)	O(nnz)	5–20×	General SPD
IC(0)	O(nnz)	O(nnz)	10–50×	SPD sparse
ILU(0)	O(nnz)	O(nnz)	10–50×	Non-symmetric
AMG	O(N log N)	O(N)	O(N)→O(1)×	Elliptic PDE

6. Eigenvalue Algorithms: Power, QR, Lanczos

Eigenvalues appear wherever a simulation has natural frequencies, stability analysis, or modal decomposition. Finding all eigenvalues of an N×N matrix costs O(N³) — only feasible for N ≲ 10³. Large sparse eigenproblems require methods that compute a few eigenvalues cheaply.

Power Iteration

Finds the largest-magnitude eigenvalue λ_max and its eigenvector:

// Power iteration: v₀ = random unit vector repeat: w = A \cdot vₖ vₖ₊₁ = w / ‖w‖ λ_k = vₖᵀ \cdot A \cdot vₖ (Rayleigh quotient) until convergence Rate: ‖error‖ decreases by |λ₂/λ₁| per step

Inverse iteration (replace A by (A − σI)⁻¹) finds the eigenvalue nearest σ by using a linear solve per step — ideal when a direct solver is cheap and the target eigenvalue is known approximately. Shift-invert with CG/GMRES replace the dense solve with sparse iteration.

Dense QR Algorithm

For dense matrices (N ≲ 10³), all eigenvalues can be found via the QR iteration:

A₀ = A Aₖ₊₁ = Rₖ \cdot Qₖ where Aₖ = Qₖ \cdot Rₖ (QR decomposition) Aₖ converges to (quasi-)upper triangular — eigenvalues on diagonal Practical: reduce to Hessenberg form first (O(N³)/3 flops, then O(N²) per step)

Lanczos Algorithm — Large Sparse

The Lanczos algorithm (1950) builds an orthonormal Krylov basis and reduces A to a symmetric tridiagonal matrix T of size k×k (k ≪ N). The eigenvalues of T (Ritz values) approximate the extreme eigenvalues of A. Only k SpMVs are needed — ideal for finding the 10–100 largest or smallest eigenvalues of sparse A.

Lanczos iteration (symmetric A): v₁ = random unit vector for j = 1, 2, ..., k: w = A \cdot vⱼ αⱼ = vⱼᵀ \cdot w w = w - αⱼ\cdotvⱼ - βⱼ₋₁\cdotvⱼ₋₁ βⱼ = ‖w‖; vⱼ₊₁ = w / βⱼ T = tridiag(α₁,...,αₖ; β₁,...,βₖ₋₁) eigvals(T) \to approximate extreme eigvals of A

In practice, loss of orthogonality accumulates — requiring either full re-orthogonalisation (ARPACK's IRLM approach) or selective orthogonalisation (Partial Lanczos). ARPACK (Arnoldi Package) is the standard library: its Python binding is SciPy's `scipy.sparse.linalg.eigsh` and `eigs`.

7. SVD, PCA & Mode Decomposition

The Singular Value Decomposition of an m×n matrix A:

A = U \cdot Σ \cdot Vᵀ U \in ℝ^(m\timesm): left singular vectors (orthogonal) Σ \in ℝ^(m\timesn): diagonal, entries σ₁ \geq σ₂ \geq ... \geq σₚ \geq 0 V \in ℝ^(n\timesn): right singular vectors (orthogonal) p = min(m, n)

The Eckart-Young theorem states that the best rank-k approximation to A (in any orthogonally invariant norm) is:

Aₖ = Σᵢ₌₁ᵏ σᵢ \cdot uᵢ \cdot vᵢᵀ ‖A - Aₖ‖₂ = σₖ₊₁ (spectral norm error = next singular value)

Proper Orthogonal Decomposition (POD)

In fluid dynamics and structural dynamics, POD (the SVD of a snapshot matrix) extracts the dominant spatial modes of a simulation. If U = [u(t₁) | u(t₂) | ... | u(tₙ)] contains N velocity snapshots, then:

Columns of U (left singular vectors) are the POD spatial modes, ordered by energy content σᵢ²
k modes capture fraction (σ₁² + ... + σₖ²) / ‖U‖_F² of total variance
Projecting the PDE onto r POD modes gives a reduced-order model (ROM) of size r ≪ N

Dynamic Mode Decomposition (DMD)

DMD (Schmid, 2010) finds a best-fit linear operator that maps snapshot Uₖ to Uₖ₊₁:

Ũ₂ \approx A \cdot Ũ₁ \to A = Ũ₂ \cdot Ũ₁⁺ (Moore-Penrose pseudoinverse via SVD) // In practice: project A onto r POD modes (cheap), compute eigvals of // r\timesr projected operator, then recover approximate eigenfunctions of A

DMD extracts spatiotemporal coherent structures from simulation data even without access to the governing equations — making it a data-driven reduced modeling tool applicable to turbulence, financial time series, and climate data.

8. JavaScript Sparse Conjugate Gradient

Below is a complete, self-contained implementation of a CSR sparse matrix and Preconditioned Conjugate Gradient solver in JavaScript. This is exactly the precision Poisson solver used in grid-based fluid simulations:

// ============================================================
// Sparse CSR matrix + Preconditioned Conjugate Gradient (PCG)
// Solves A·x = b  where A is SPD and sparse (CSR format)
// ============================================================

class SparseCSR {
  constructor(n) {
    this.n = n;
    this.rowptr = new Int32Array(n + 1);
    this.cols   = [];    // will be converted to Int32Array
    this.vals   = [];    // Float64Array after build
  }

  // Assemble from COO triplets [{row, col, val}]
  static fromCOO(n, coo) {
    // Sort by row then col
    coo.sort((a, b) => a.row !== b.row ? a.row - b.row : a.col - b.col);
    const mat   = new SparseCSR(n);
    const cols  = [], vals = [];
    const rowptr = new Int32Array(n + 1);

    for (const {row, col, val} of coo) {
      rowptr[row + 1]++;
      cols.push(col);
      vals.push(val);
    }
    for (let i = 1; i <= n; i++) rowptr[i] += rowptr[i - 1];
    mat.rowptr = rowptr;
    mat.cols   = new Int32Array(cols);
    mat.vals   = new Float64Array(vals);
    return mat;
  }

  // y = A·x (sparse matrix-vector product)
  matvec(x, y) {
    const {n, rowptr, cols, vals} = this;
    for (let i = 0; i < n; i++) {
      let s = 0;
      for (let j = rowptr[i]; j < rowptr[i + 1]; j++) s += vals[j] * x[cols[j]];
      y[i] = s;
    }
  }

  // Diagonal (Jacobi) preconditioner
  diagInv() {
    const d = new Float64Array(this.n);
    const {n, rowptr, cols, vals} = this;
    for (let i = 0; i < n; i++) {
      for (let j = rowptr[i]; j < rowptr[i + 1]; j++) {
        if (cols[j] === i) { d[i] = 1 / vals[j]; break; }
      }
    }
    return d;
  }
}

// Solves A·x = b using Preconditioned CG
// Returns {x, iter, residual}
function pcg(A, b, opts = {}) {
  const {tol = 1e-6, maxIter = 1000} = opts;
  const n = A.n;
  const diag  = A.diagInv();          // Jacobi preconditioner M⁻¹
  const x     = new Float64Array(n);
  const r     = new Float64Array(n);
  const z     = new Float64Array(n);
  const p     = new Float64Array(n);
  const Ap    = new Float64Array(n);

  // r₀ = b − A·x₀  (x₀ = 0 → r₀ = b)
  for (let i = 0; i < n; i++) r[i] = b[i];

  // z₀ = M⁻¹·r₀  (Jacobi: componentwise multiply)
  let rz = 0;
  for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; p[i] = z[i]; rz += r[i] * z[i]; }

  let iter = 0;
  while (iter < maxIter) {
    A.matvec(p, Ap);
    let pAp = 0;
    for (let i = 0; i < n; i++) pAp += p[i] * Ap[i];

    const alpha = rz / pAp;
    let rnorm2 = 0;
    for (let i = 0; i < n; i++) {
      x[i] += alpha * p[i];
      r[i] -= alpha * Ap[i];
      rnorm2 += r[i] * r[i];
    }

    if (Math.sqrt(rnorm2) < tol) break;

    let rzNew = 0;
    for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; rzNew += r[i] * z[i]; }
    const beta = rzNew / rz;
    rz = rzNew;
    for (let i = 0; i < n; i++) p[i] = z[i] + beta * p[i];
    iter++;
  }
  return {x, iter, residual: Math.sqrt(rz)};
}

// ---- Example: 1D Poisson on uniform grid, 5 interior points ----
// −u'' = f,  u(0)=u(1)=0,  f=1   → u_i = xi(1−xi)/2
const N  = 5;
const h  = 1 / (N + 1);
const h2 = h * h;

// Build tridiagonal Laplacian matrix in COO
const coo = [];
for (let i = 0; i < N; i++) {
  coo.push({row: i, col: i, val:  2 / h2});
  if (i > 0)     coo.push({row: i, col: i-1, val: -1 / h2});
  if (i < N - 1) coo.push({row: i, col: i+1, val: -1 / h2});
}
const A = SparseCSR.fromCOO(N, coo);
const b = new Float64Array(N).fill(1);         // f=1

const {x, iter} = pcg(A, b, {tol: 1e-10});

console.log("PCG converged in", iter, "iterations");
x.forEach((u, i) => {
  const xi  = (i + 1) * h;
  const ref = xi * (1 - xi) / 2;
  console.log(`u[${i}] = ${u.toFixed(8)}  (ref: ${ref.toFixed(8)})`);
});

Solver selection guide: For N < 5,000 and dense matrices → Cholesky or LU. For N < 100,000 and sparse SPD matrices → sparse Cholesky (CHOLMOD). For N > 100,000 and SPD elliptic PDEs → PCG + AMG preconditioner. For non-symmetric N > 10,000 → GMRES(30) or BiCGSTAB + ILU(1). For eigenvalues (a few, sparse) → Lanczos / ARPACK. In practice: call a library (LAPACK, ARPACK, HYPRE) rather than implementing from scratch — but understanding the algorithm is essential for choosing the right one.