Deep Dive #1 — SPH Fluid Dynamics: From Navier-Stokes to 60 FPS in the Browser

Why SPH?

The Navier-Stokes equations describe fluid motion, but solving them on a fixed Eulerian grid requires expensive pressure-projection steps and complex free-surface tracking. For an interactive, visual simulation where the user pours, splashes, and stirs the fluid, a Lagrangian particle approach is a much better fit. Smoothed Particle Hydrodynamics (SPH, first described by Gingold & Monaghan 1977) represents the fluid as a cloud of particles that carry mass, velocity, density, and pressure. The equations of motion are evaluated entirely at particle positions — no grid, no mesh.

The trade-off: neighbour searches are O(N²) naïvely, which is catastrophic at the 2 000+ particles we need for a satisfying visual. Almost all the engineering effort in the simulation went into making those neighbour searches fast.

The Physics: SPH in Five Equations

Every quantity A at particle i is estimated by summing the contributions of neighbouring particles j, weighted by a smoothing kernel W(r, h):

A(xᵢ) ≈ Σⱼ  mⱼ · (Aⱼ / ρⱼ) · W(|xᵢ - xⱼ|, h)

where:
  mⱼ  = mass of particle j
  ρⱼ  = density at particle j
  h   = smoothing radius (kernel support)
  W   = cubic spline kernel

From this single interpolation formula, the key fluid quantities follow:

Density — sum of weighted neighbour masses at each particle.
Pressure — equation of state: p = k(ρ − ρ₀), where k is a stiffness parameter and ρ₀ is rest density. Weakly compressible SPH (WCSPH) tolerates ~1 % density deviation, which is visually imperceptible.
Pressure force — symmetric gradient: Fᵢ_pressure = −Σⱼ mⱼ(pᵢ+pⱼ)/(2ρⱼ) · ∇W
Viscosity force — Laplacian-based: Fᵢ_viscosity = μ Σⱼ mⱼ(vⱼ−vᵢ)/ρⱼ · ∇²W
Surface tension — Müller's colour field curvature approach — expensive but essential for realistic droplets.

The Spatial Hash Grid

The naive O(N²) neighbour search — compare every particle against every other particle — is instantly fatal at scale. With 2 000 particles that's 4 M comparisons per frame at 60 FPS: 240 M comparisons per second in JavaScript. Not going to happen.

The solution is a spatial hash grid: a flat hash map from 2D cell coordinates to lists of particle indices. Each cell has side length equal to the kernel radius h, so any two particles that could be neighbours must be in the same or adjacent cells. The neighbour search is then:

// Build phase — O(N)
for each particle i:
    cell = floor(position[i] / h)
    hashGrid[hash(cell)].push(i)

// Query phase — O(1) amortised per particle
for each particle i:
    for dx in [-1, 0, 1]:
        for dy in [-1, 0, 1]:
            cell = floor(position[i] / h) + (dx, dy)
            for j in hashGrid[hash(cell)]:
                if |xᵢ - xⱼ| < h:
                    process_neighbour(i, j)

The hash function hash(cx, cy) = (cx * 73856093 ^ cy * 19349663) % TABLE_SIZE is a standard spatial locality hash. TABLE_SIZE is set to the next prime above 2× the expected particle count to keep load factor below 0.5.

TypedArrays: Keeping GC Out of the Hot Path

Each particle's state — position, velocity, density, pressure — lives in a Float32Array with a fixed stride of 8 floats per particle: [x, y, vx, vy, density, pressure, ax, ay]. Keeping data in typed arrays eliminates object allocation and garbage collection pressure in the inner integration loop. Cache-friendly layout means the CPU prefetcher can predict access patterns.

Integration: Leapfrog Not Euler

Simple Euler integration (v += a·dt; x += v·dt) is energy-inconsistent — it slowly adds energy to the system, causing particles to accelerate and eventually explode. We use Leapfrog (Störmer-Verlet), which is symplectic (exactly preserves a discrete analogue of energy) and second-order accurate at the same cost as Euler:

// Leapfrog: positions and velocities are offset by half a timestep
v_half  = v + 0.5 * a_prev * dt     // kick
x_new   = x + v_half * dt            // drift
a_new   = compute_forces(x_new)
v_new   = v_half + 0.5 * a_new * dt // kick

// Store for next frame
v = v_new;  x = x_new;  a_prev = a_new;

The timestep Δt is clamped using the CFL condition: dt = 0.4 * h / v_max. If particles move faster than one kernel radius per step, the neighbourhood search breaks down.

Boundary Handling: No Ghost Particles

Many SPH implementations use "ghost particles" to enforce boundary conditions — extra particles placed outside the domain that provide repulsion. Ghost particles double the particle count and complicate the hash grid. We instead use a simpler penalty force: when a particle crosses a wall, a spring-like repulsion proportional to penetration depth pushes it back. Paired with velocity damping on bounce, this gives clean wall collisions with no tunnelling at the particle sizes we use.

Rendering: Points to Metaballs via Screen-Space

Rendering individual particles as circles looks like a bag of beads, not fluid. The trick is screen-space fluid rendering:

Render each particle as a WebGL point sprite with a smooth Gaussian alpha profile, writing depth into an offscreen buffer.
Apply bilateral blur (edge-preserving) to smooth the depth field across neighbouring particles while preserving sharp interfaces.
Reconstruct per-pixel surface normals from the blurred depth gradient (dFdx, dFdy in GLSL).
Apply Phong shading with a Fresnel-based water colour — blue highlight near specular, translucent at grazing angles.

The whole render pipeline is two draw calls: particles → depth FBO, then full-screen quad → final composite. On a mid-range GPU it adds less than 0.3 ms to the frame.

Performance Numbers

Optimisation	Before	After	Gain
O(N²) → spatial hash	4 FPS (500 p)	60 FPS (500 p)	15×
Objects → TypedArrays	60 FPS (500 p)	60 FPS (2 000 p)	4× particles
Euler → Leapfrog	Explodes at dt > 0.008	Stable at dt = 0.016	2× dt
Screen-space blur pass	Beads look	Continuous fluid surface	Visual

Key insight: The spatial hash grid accounts for ~90 % of the total performance budget win. If you're building any O(N²) particle simulation, the hash grid is the first and most impactful optimisation to reach for.

What I'd Do Differently Today

The simulation works well, but revisiting it now I'd make three changes:

WebGPU Compute Shaders — move the entire SPH integration to a compute shader. The neighbour search maps beautifully to parallel reduction on GPU. Potential to push to 10 000+ particles on modern hardware.
PCISPH pressure solver — the current WCSPH requires small timesteps to control density error. Predictive-corrective SPH (PCISPH) iterates to converge pressure, allowing larger timesteps and stiffer incompressibility at the same frame rate.
Adaptive kernel radius — currently h is fixed, which means sparse regions under-sample and dense regions waste computation. Adaptive h based on local density would let the simulation handle both free-flying droplets and dense pools with one particle count.

You can explore the simulation right now at mysimulator.uk/fluid/. Drop objects, change viscosity, crank up the particle count — all the sliders expose the parameters described above.