Why WebGPU Compute?
WebGL has compute shaders — sort of. Transform feedback and render-to-texture can be abused as GPGPU, but only for vector operations that fit into a texture coordinate. WebGPU's compute pipelines are first-class: arbitrary data access, read-write storage buffers, atomic operations, and workgroup shared memory. This is what CUDA and Metal Compute gave native apps; now it's in the browser.
Our two target simulations — the Lattice-Boltzmann D2Q9 flow and the SPH fluid — are both embarrassingly parallel at the cell/particle level. The collision step visits every cell independently. The density estimation step visits every particle pair within a radius. These are textbook GPGPU workloads.
Benchmark: JavaScript vs WebGPU
| Simulation | Grid / Count | JS (fps) | WebGPU (fps) | Speedup |
|---|---|---|---|---|
| Lattice-Boltzmann 2D | 512 × 512 | 4 | 60 | 15× |
| LBM 2D high-res | 1024 × 1024 | <1 | 48 | ~50× |
| Lattice-Boltzmann 3D | 128³ (new!) | impossible | 24 | ∞ |
| SPH Fluid | 4 000 particles | 18 | 60 | 3.3× |
| SPH Fluid | 16 000 particles | 1 | 38 | 38× |
The WGSL Compute Shader
WGSL (WebGPU Shading Language) is the language of WebGPU shaders. It's closer to Rust than GLSL — explicit types, no implicit casts, explicit binding groups. Here's the core LBM collision step:
// binding layout: group 0, bindings 0-2
@group(0) @binding(0) var<storage, read> f_in : array<f32>;
@group(0) @binding(1) var<storage, read_write> f_out : array<f32>;
@group(0) @binding(2) var<uniform> params: Params;
struct Params { nx: u32, ny: u32, tau: f32 }
@compute @workgroup_size(16, 16)
fn lbm_collision(@builtin(global_invocation_id) id: vec3<u32>) {
let x = id.x; let y = id.y;
if (x >= params.nx || y >= params.ny) { return; }
let base = (y * params.nx + x) * 9u;
var rho = 0.0; var ux = 0.0; var uy = 0.0;
// Compute density and velocity from distribution functions
for (var i = 0u; i < 9u; i++) {
let fi = f_in[base + i];
rho += fi;
ux += fi * EX[i];
uy += fi * EY[i];
}
ux /= rho; uy /= rho;
// BGK collision: f_out = f_in - (f_in - f_eq) / tau
let usq = ux*ux + uy*uy;
for (var i = 0u; i < 9u; i++) {
let eu = EX[i]*ux + EY[i]*uy;
let feq = rho * W[i] * (1.0 + 3.0*eu + 4.5*eu*eu - 1.5*usq);
f_out[base + i] = f_in[base + i] - (f_in[base + i] - feq) / params.tau;
}
}
Fallback Strategy
WebGPU has ~65% browser support as of late 2026 (Chrome 113+, Edge, Safari Technology Preview, Firefox behind a flag). For the other 35% we fall back to the JavaScript version. The detection is a single feature check:
async function initSimulation() {
if ('gpu' in navigator) {
const adapter = await navigator.gpu.requestAdapter();
if (adapter) {
const device = await adapter.requestDevice();
return new LBMSimulatorWebGPU(device); // fast path
}
}
return new LBMSimulatorJS(); // fallback — same API, JS implementation
}
The key insight: both the WebGPU and JavaScript
implementations expose exactly the same step() and
getVelocityField() interface. The rendering code
doesn't know which backend it's talking to. This means the fallback
is completely transparent to the user — they just see a slower
simulation on unsupported browsers.
WebGPU Gotchas We Hit
⚠️ Buffer alignment
WGSL storage buffers must be 256-byte aligned. Our 9-float per-cell LBM buffer needed explicit padding to avoid silent data corruption on some GPUs.
⚠️ No dynamic indexing in uniforms
The D2Q9 weight and direction constants can't be in a uniform buffer with dynamic index. We embedded them as WGSL constants (compile-time arrays).
✅ Workgroup size tuning
16×16 workgroups outperformed 8×8 and 32×32 on most tested hardware. AMD and Apple Silicon preferred 8×8. We expose this as a quality preset.
✅ Streaming from compute to render
The velocity field buffer is bound as both a compute storage buffer and a WebGPU vertex buffer. No CPU readback needed — the data lives on GPU the entire frame.