Linear Algebra for Simulations: Sparse Solvers, Decompositions & Eigenvalues
Every physics simulation eventually reduces to a linear system Ax = b. Whether it is the pressure Poisson equation in fluid dynamics, the global stiffness matrix in finite-element analysis, or the Laplacian in heat diffusion, the choice of linear solver determines whether a simulation runs in milliseconds or hours. This article surveys the algorithms — from direct LU decomposition to iterative Krylov methods — that power modern computational physics.
1. Why Linear Systems Dominate Simulation
Physical laws — conservation of mass, momentum, and energy — become linear systems when discretised in space and time. Three archetypal examples:
FEM: Global Stiffness Matrix
The finite-element method for structural mechanics produces:
N is the number of degrees of freedom (3 per node in 3D). For a mesh with 100,000 nodes, N ≈ 300,000. K has bandwidth determined by mesh connectivity — roughly 20–30 non-zeros per row — yielding about 6 million nonzero entries out of 90 billion total: a sparsity of 99.993%.
Fluid: Pressure Poisson Equation
In incompressible Navier–Stokes simulations (e.g. FLIP fluid, SPH), enforcing ∇·u = 0 requires solving:
The discrete Laplacian on a 256³ grid has N ≈ 16.7 million unknowns. Direct factorisation would require O(N³) ≈ 4.6 × 10²¹ operations — utterly infeasible. This is why iterative solvers exist.
Common Properties
- Sparsity: O(N) nonzeros out of N² possible entries
- Symmetry: For undirected graphs or self-adjoint operators, A = Aᵀ
- Positive definiteness (SPD): xᵀAx > 0 for all x ≠ 0 — enables special algorithms
- Structured sparsity: FEM matrices often have block structure following mesh topology
2. Sparse Matrix Formats: CSR, CSC, COO
Sparse matrix formats store only nonzero values and their positions, trading memory for the overhead of index arrays.
COO — Coordinate Format
Three arrays: row indices, column indices, values. Unordered, allows duplicate (row, col) entries (summed on conversion). Best for matrix assembly.
CSR — Compressed Sparse Row
The most common format for arithmetic. Replace row array with row pointer (length N+1 where rowptr[i+1] − rowptr[i] is the number of nonzeros in row i):
CSC — Compressed Sparse Column
Row and column roles swapped. Preferred by LAPACK convention and for column-oriented operations (e.g. factorisation pivoting). MATLAB's native format is CSC.
Format Comparison
| Format | Memory | Matvec | Assembly | Best for |
|---|---|---|---|---|
| COO | 3×nnz | Slow (cache miss) | Excellent | Building from mesh |
| CSR | 2×nnz + N+1 | Fast (row cache-local) | Poor | CG, GMRES matvec |
| CSC | 2×nnz + N+1 | Fast (col access) | Poor | LU factorisation |
| BSR | 2×nnz/bs + N/bs+1 | Fastest (SIMD blocks) | Moderate | FEM block DOFs |
3. Direct Solvers: LU, Cholesky, QR
Direct methods factorise A once and then solve via triangular back-substitution in O(N) operations — but the factorisation itself can be expensive and fill-in (creation of nonzeros where A was zero) limits scalability to N ≲ 10⁵ in 3D.
LU Decomposition
For a general n×n matrix: A = P · L · U where P is a permutation matrix (partial pivoting), L is lower triangular with unit diagonal, U is upper triangular.
Sparse LU via UMFPACK or SuperLU uses fill-reducing orderings. The nested dissection ordering gives O(N^1.5) fill-in for 2D grids and O(N^2) for 3D — much better than natural ordering but still worse than iterative solvers for very large 3D problems.
Cholesky Decomposition (SPD)
For symmetric positive definite matrices: A = LLᵀ where L is lower triangular with positive diagonal. No pivoting needed (stability is guaranteed for SPD). Roughly 2× faster than LU — only the lower triangle is stored and processed.
Applications: FEM stiffness matrices, covariance matrices in probabilistic methods, SPD preconditioner factors (incomplete Cholesky IC(0)).
QR Decomposition
Factorises A = QR where Q is orthogonal (Qᵀ = Q⁻¹) and R is upper triangular. Use cases:
- Least squares: min‖Ax − b‖₂ → Rx = Qᵀb (n columns, m rows, m > n)
- Eigenvalues: QR iteration for dense matrices
- Gram-Schmidt orthogonalisation: basis for Krylov subspace methods
Computed via Householder reflectors or Givens rotations. Householder QR is numerically superior to classical Gram-Schmidt — the orthogonality error is O(ε·κ(A)) vs O(ε·κ(A)²) for CGS.
4. Iterative Solvers: CG, GMRES, BiCGSTAB
Iterative methods start from an initial guess x₀ and improve it one "step" at a time. Each step costs O(nnz) for a sparse matrix-vector multiply. They never fully factorise A — making them applicable to problems where direct methods are infeasible.
Conjugate Gradient (CG) — SPD Only
The CG method (Hestenes & Stiefel, 1952) minimises the quadratic form f(x) = ½ xᵀAx − bᵀx over a sequence of A-conjugate search directions. For an SPD matrix, it converges in at most N steps in exact arithmetic (N = matrix size), but in practice converges to machine precision far sooner.
GMRES — General Matrices
The Generalised Minimal Residual method (Saad & Schultz, 1986) handles non-symmetric and indefinite systems. It builds an orthonormal Krylov basis {v₁, v₂, ..., vₖ} via Arnoldi iteration and minimises ‖b − Axₖ‖₂ over the Krylov subspace Kₖ = span{v₁, ..., vₖ}.
Restarted GMRES(k) restarts the Krylov basis every k steps to control memory. The tradeoff: smaller k uses less memory but may stagnate on difficult systems. Flexible GMRES (FGMRES) allows the preconditioner to change between iterations.
BiCGSTAB
Biconjugate Gradient Stabilised (van der Vorst, 1992) is a variant for non-symmetric systems that avoids GMRES's memory growth. It uses two sequences of residuals to achieve smoother convergence than BiCG (which can show erratic behaviour). Memory: O(N) — only a fixed number of vectors regardless of iteration count. Preferred over GMRES when memory is tight.
5. Preconditioners: Jacobi, ILU, Multigrid
A preconditioner M approximates A and transforms the system to M⁻¹Ax = M⁻¹b. Convergence of CG/GMRES depends on the condition number κ(M⁻¹A) — a good preconditioner reduces κ at low cost.
Jacobi (Diagonal) Preconditioner
Incomplete LU: ILU(0) and ILU(k)
Incomplete LU factorises A ≈ L̃Ũ where L̃ and Ũ share the sparsity pattern of A (ILU(0)) or allow k levels of additional fill-in (ILU(k)). The approximate factors are cheap to apply (sparse triangular solves) and substantially reduce κ.
ILU(k) allows fill-in up to distance k from existing nonzeros. ILU(1) or ILU(2) often reduces iteration count by 5–10× vs. Jacobi at the cost of 3–5× more memory per factor.
Multigrid: Optimal O(N) Convergence
Multigrid preconditioners (or standalone solvers) achieve O(N) total work by solving the problem on a hierarchy of coarser grids and using coarse-grid solutions to correct fine-grid residuals.
- Smoothing: a few Gauss-Seidel or Jacobi iterations damp high-frequency error components
- Restriction: project residual to coarser grid (full weighting or injection)
- Coarse solve: exact solve at coarsest level (trivially small)
- Prolongation: interpolate correction back to fine grid
- V-cycle: one down-pass + one up-pass; W-cycle: two recursive down-passes
Geometric multigrid requires a structured mesh hierarchy. Algebraic multigrid (AMG) constructs the hierarchy automatically from the matrix — HYPRE's BoomerAMG and ML/MueLu (Trilinos) are standard libraries in scientific computing. For Poisson equations on uniform grids, multigrid converges in O(1) iterations regardless of grid size.
| Preconditioner | Setup cost | Apply cost | Typical κ reduction | Best for |
|---|---|---|---|---|
| None | O(1) | O(1) | 1× | Well-conditioned A |
| Jacobi | O(N) | O(N) | 2–5× | Diagonally dominant |
| SSOR | O(nnz) | O(nnz) | 5–20× | General SPD |
| IC(0) | O(nnz) | O(nnz) | 10–50× | SPD sparse |
| ILU(0) | O(nnz) | O(nnz) | 10–50× | Non-symmetric |
| AMG | O(N log N) | O(N) | O(N)→O(1)× | Elliptic PDE |
6. Eigenvalue Algorithms: Power, QR, Lanczos
Eigenvalues appear wherever a simulation has natural frequencies, stability analysis, or modal decomposition. Finding all eigenvalues of an N×N matrix costs O(N³) — only feasible for N ≲ 10³. Large sparse eigenproblems require methods that compute a few eigenvalues cheaply.
Power Iteration
Finds the largest-magnitude eigenvalue λ_max and its eigenvector:
Inverse iteration (replace A by (A − σI)⁻¹) finds the eigenvalue nearest σ by using a linear solve per step — ideal when a direct solver is cheap and the target eigenvalue is known approximately. Shift-invert with CG/GMRES replace the dense solve with sparse iteration.
Dense QR Algorithm
For dense matrices (N ≲ 10³), all eigenvalues can be found via the QR iteration:
Lanczos Algorithm — Large Sparse
The Lanczos algorithm (1950) builds an orthonormal Krylov basis and reduces A to a symmetric tridiagonal matrix T of size k×k (k ≪ N). The eigenvalues of T (Ritz values) approximate the extreme eigenvalues of A. Only k SpMVs are needed — ideal for finding the 10–100 largest or smallest eigenvalues of sparse A.
In practice, loss of orthogonality accumulates — requiring either full re-orthogonalisation (ARPACK's IRLM approach) or selective orthogonalisation (Partial Lanczos). ARPACK (Arnoldi Package) is the standard library: its Python binding is SciPy's `scipy.sparse.linalg.eigsh` and `eigs`.
7. SVD, PCA & Mode Decomposition
The Singular Value Decomposition of an m×n matrix A:
The Eckart-Young theorem states that the best rank-k approximation to A (in any orthogonally invariant norm) is:
Proper Orthogonal Decomposition (POD)
In fluid dynamics and structural dynamics, POD (the SVD of a snapshot matrix) extracts the dominant spatial modes of a simulation. If U = [u(t₁) | u(t₂) | ... | u(tₙ)] contains N velocity snapshots, then:
- Columns of U (left singular vectors) are the POD spatial modes, ordered by energy content σᵢ²
- k modes capture fraction (σ₁² + ... + σₖ²) / ‖U‖_F² of total variance
- Projecting the PDE onto r POD modes gives a reduced-order model (ROM) of size r ≪ N
Dynamic Mode Decomposition (DMD)
DMD (Schmid, 2010) finds a best-fit linear operator that maps snapshot Uₖ to Uₖ₊₁:
DMD extracts spatiotemporal coherent structures from simulation data even without access to the governing equations — making it a data-driven reduced modeling tool applicable to turbulence, financial time series, and climate data.
8. JavaScript Sparse Conjugate Gradient
Below is a complete, self-contained implementation of a CSR sparse matrix and Preconditioned Conjugate Gradient solver in JavaScript. This is exactly the precision Poisson solver used in grid-based fluid simulations:
// ============================================================
// Sparse CSR matrix + Preconditioned Conjugate Gradient (PCG)
// Solves A·x = b where A is SPD and sparse (CSR format)
// ============================================================
class SparseCSR {
constructor(n) {
this.n = n;
this.rowptr = new Int32Array(n + 1);
this.cols = []; // will be converted to Int32Array
this.vals = []; // Float64Array after build
}
// Assemble from COO triplets [{row, col, val}]
static fromCOO(n, coo) {
// Sort by row then col
coo.sort((a, b) => a.row !== b.row ? a.row - b.row : a.col - b.col);
const mat = new SparseCSR(n);
const cols = [], vals = [];
const rowptr = new Int32Array(n + 1);
for (const {row, col, val} of coo) {
rowptr[row + 1]++;
cols.push(col);
vals.push(val);
}
for (let i = 1; i <= n; i++) rowptr[i] += rowptr[i - 1];
mat.rowptr = rowptr;
mat.cols = new Int32Array(cols);
mat.vals = new Float64Array(vals);
return mat;
}
// y = A·x (sparse matrix-vector product)
matvec(x, y) {
const {n, rowptr, cols, vals} = this;
for (let i = 0; i < n; i++) {
let s = 0;
for (let j = rowptr[i]; j < rowptr[i + 1]; j++) s += vals[j] * x[cols[j]];
y[i] = s;
}
}
// Diagonal (Jacobi) preconditioner
diagInv() {
const d = new Float64Array(this.n);
const {n, rowptr, cols, vals} = this;
for (let i = 0; i < n; i++) {
for (let j = rowptr[i]; j < rowptr[i + 1]; j++) {
if (cols[j] === i) { d[i] = 1 / vals[j]; break; }
}
}
return d;
}
}
// Solves A·x = b using Preconditioned CG
// Returns {x, iter, residual}
function pcg(A, b, opts = {}) {
const {tol = 1e-6, maxIter = 1000} = opts;
const n = A.n;
const diag = A.diagInv(); // Jacobi preconditioner M⁻¹
const x = new Float64Array(n);
const r = new Float64Array(n);
const z = new Float64Array(n);
const p = new Float64Array(n);
const Ap = new Float64Array(n);
// r₀ = b − A·x₀ (x₀ = 0 → r₀ = b)
for (let i = 0; i < n; i++) r[i] = b[i];
// z₀ = M⁻¹·r₀ (Jacobi: componentwise multiply)
let rz = 0;
for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; p[i] = z[i]; rz += r[i] * z[i]; }
let iter = 0;
while (iter < maxIter) {
A.matvec(p, Ap);
let pAp = 0;
for (let i = 0; i < n; i++) pAp += p[i] * Ap[i];
const alpha = rz / pAp;
let rnorm2 = 0;
for (let i = 0; i < n; i++) {
x[i] += alpha * p[i];
r[i] -= alpha * Ap[i];
rnorm2 += r[i] * r[i];
}
if (Math.sqrt(rnorm2) < tol) break;
let rzNew = 0;
for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; rzNew += r[i] * z[i]; }
const beta = rzNew / rz;
rz = rzNew;
for (let i = 0; i < n; i++) p[i] = z[i] + beta * p[i];
iter++;
}
return {x, iter, residual: Math.sqrt(rz)};
}
// ---- Example: 1D Poisson on uniform grid, 5 interior points ----
// −u'' = f, u(0)=u(1)=0, f=1 → u_i = xi(1−xi)/2
const N = 5;
const h = 1 / (N + 1);
const h2 = h * h;
// Build tridiagonal Laplacian matrix in COO
const coo = [];
for (let i = 0; i < N; i++) {
coo.push({row: i, col: i, val: 2 / h2});
if (i > 0) coo.push({row: i, col: i-1, val: -1 / h2});
if (i < N - 1) coo.push({row: i, col: i+1, val: -1 / h2});
}
const A = SparseCSR.fromCOO(N, coo);
const b = new Float64Array(N).fill(1); // f=1
const {x, iter} = pcg(A, b, {tol: 1e-10});
console.log("PCG converged in", iter, "iterations");
x.forEach((u, i) => {
const xi = (i + 1) * h;
const ref = xi * (1 - xi) / 2;
console.log(`u[${i}] = ${u.toFixed(8)} (ref: ${ref.toFixed(8)})`);
});