WebGL2 Advanced: VAO, Transform Feedback &amp; GPGPU Ping-Pong

3D Simulations

WebGL2 (shipped in all major browsers since 2017, backed by OpenGL ES 3.0) introduced several powerful features missing from WebGL1: Vertex Array Objects (VAO), Transform Feedback, multiple render targets (MRT), integer textures, instanced rendering, and floating-point framebuffers. Together these enable production-quality GPGPU computation entirely in the browser without WebGPU. This article covers the three most important advanced patterns.

1. WebGL2 vs WebGL1 — Feature Delta

Feature	WebGL1	WebGL2
Vertex Array Objects (VAO)	Extension only (OES_vertex_array_object)	Core
Transform Feedback	Not available	Core
Float textures (RGBA32F rendering)	Extension (not renderable)	EXT_color_buffer_float (renderable)
Integer textures	No	RGBA8UI, RGBA32I, etc.
Multiple Render Targets (MRT)	WEBGL_draw_buffers extension	Core (drawBuffers)
Instanced rendering	ANGLE_instanced_arrays extension	Core
Uniform Buffer Objects (UBO)	No	Core
3D textures	No	Core (TEXTURE_3D)
GLSL version	#version 100 (ES 1.0)	#version 300 es (ES 3.0)

2. Vertex Array Objects (VAO)

In WebGL1, every draw call requires rebinding all vertex attribute pointers: bindBuffer, vertexAttribPointer, enableVertexAttribArray for each attribute. VAOs encapsulate all this state once and replay it with a single gl.bindVertexArray(vao).

// WebGL2 — create and populate a VAO once
const vao = gl.createVertexArray();
gl.bindVertexArray(vao);

// positions
const posBuf = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, posBuf);
gl.bufferData(gl.ARRAY_BUFFER, positions, gl.DYNAMIC_COPY);
gl.enableVertexAttribArray(0);
gl.vertexAttribPointer(0, 3, gl.FLOAT, false, 0, 0);

// velocities
const velBuf = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, velBuf);
gl.bufferData(gl.ARRAY_BUFFER, velocities, gl.DYNAMIC_COPY);
gl.enableVertexAttribArray(1);
gl.vertexAttribPointer(1, 3, gl.FLOAT, false, 0, 0);

gl.bindVertexArray(null);

// Later, per frame:
gl.bindVertexArray(vao);
gl.drawArrays(gl.POINTS, 0, N_PARTICLES);
gl.bindVertexArray(null);

VAOs are especially important for scenes with many different meshes — each mesh gets its own VAO. Switching from mesh A to mesh B costs one bindVertexArray instead of 10–20 state changes.

3. Transform Feedback: GPU Particle Systems

Transform Feedback allows the output of the vertex shader to be captured back into a GPU buffer, bypassing rasterisation. This enables GPU-side computation loops without ever reading data back to the CPU:

VBO_A (read) \to [Vertex Shader: physics update] \to TF VBO_B (write) VBO_B (read) \to [Vertex Shader: physics update] \to TF VBO_A (write) \to ping-pong every frame

Setup skeleton

// 1. Declare TF varyings BEFORE linking the program
gl.transformFeedbackVaryings(
  updateProgram,
  ['v_position', 'v_velocity', 'v_age'],
  gl.SEPARATE_ATTRIBS   // or INTERLEAVED_ATTRIBS
);
gl.linkProgram(updateProgram);

// 2. Create transform feedback object
const tf = gl.createTransformFeedback();
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, tf);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 0, bufB_pos);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 1, bufB_vel);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 2, bufB_age);
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, null);

// 3. Per frame — physics update pass (no rasterisation)
gl.useProgram(updateProgram);
gl.bindVertexArray(vaoA);         // read from buffer A
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, tf);  // write to B
gl.enable(gl.RASTERIZER_DISCARD); // skip fragment shader
gl.beginTransformFeedback(gl.POINTS);
gl.drawArrays(gl.POINTS, 0, N);
gl.endTransformFeedback();
gl.disable(gl.RASTERIZER_DISCARD);
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, null);

// 4. Render pass — draw buffer B
gl.useProgram(renderProgram);
gl.bindVertexArray(vaoB);
gl.drawArrays(gl.POINTS, 0, N);

// 5. Swap A ↔ B for next frame

The vertex shader for the physics update receives the current position, velocity, and age as attributes, and outputs the next state as varyings captured by TF. No CPU-GPU data transfer — the entire simulation runs at GPU memory bandwidth (hundreds of GB/s).

GLSL 300 es physics vertex shader (excerpt)

#version 300 es
precision highp float;

layout(location=0) in vec3 a_position;
layout(location=1) in vec3 a_velocity;
layout(location=2) in float a_age;

out vec3 v_position;
out vec3 v_velocity;
out float v_age;

uniform float u_dt;
uniform vec3  u_gravity;
uniform float u_lifespan;

void main() {
  vec3 vel = a_velocity + u_gravity * u_dt;
  vec3 pos = a_position + vel * u_dt;

  // respawn dead particles at origin with random velocity
  float age = a_age + u_dt;
  if (age > u_lifespan) {
    // use gl_VertexID as seed for pseudo-random respawn
    float s = sin(float(gl_VertexID) * 127.1 + age * 311.7);
    vel = normalize(vec3(s, fract(s*1.61), fract(s*2.72))) * 2.0;
    pos = vec3(0.0);
    age = 0.0;
  }

  // simple ground bounce
  if (pos.y < 0.0) { pos.y = 0.0; vel.y = abs(vel.y) * 0.6; }

  v_position = pos;
  v_velocity = vel;
  v_age      = age;
}

4. GPGPU Ping-Pong Framebuffers

For 2-D simulations (fluid, reaction-diffusion, game of life, wave equations), the preferred GPGPU pattern is the ping-pong texture: two framebuffers each backed by a float texture. Each frame reads from "A" and writes to "B", then swaps.

// Setup — RGBA32F textures (requires EXT_color_buffer_float)
const ext = gl.getExtension('EXT_color_buffer_float');
if (!ext) throw new Error('EXT_color_buffer_float not available');

function makeFloatFBO(w, h) {
  const tex = gl.createTexture();
  gl.bindTexture(gl.TEXTURE_2D, tex);
  gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA32F, w, h, 0,
                gl.RGBA, gl.FLOAT, null);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);

  const fbo = gl.createFramebuffer();
  gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
  gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0,
                          gl.TEXTURE_2D, tex, 0);
  return { fbo, tex };
}

let [A, B] = [makeFloatFBO(W, H), makeFloatFBO(W, H)];

// Per frame:
gl.bindFramebuffer(gl.FRAMEBUFFER, B.fbo);
gl.bindTexture(gl.TEXTURE_2D, A.tex);
// draw fullscreen quad with simulation shader
[A, B] = [B, A]; // swap

The simulation kernel runs entirely in the fragment shader: each texel computes its new state from the texel and its neighbours (via texture(uState, uv + offset)). This approach powers WebGL fluid simulations, reaction-diffusion, CA, and custom neural network inference in the browser.

5. Multiple Render Targets (MRT)

A single render pass can write to multiple textures simultaneously using gl.drawBuffers([gl.COLOR_ATTACHMENT0, gl.COLOR_ATTACHMENT1, …]) with multiple layout(location=N) fragment shader outputs:

// Fragment shader with MRT
#version 300 es
precision highp float;
layout(location=0) out vec4 out_position;  // G-Buffer position
layout(location=1) out vec4 out_normal;    // G-Buffer normal
layout(location=2) out vec4 out_albedo;    // G-Buffer albedo

void main() {
  out_position = vec4(v_worldPos, gl_FragDepth);
  out_normal   = vec4(normalize(v_normal) * 0.5 + 0.5, 1.0);
  out_albedo   = texture(u_albedoTex, v_uv);
}

MRT is the foundation of deferred shading: fill a G-Buffer in one pass (expensive vertex + material evaluation), then apply all lights in a separate pass sampling the G-Buffer. This decouples scene complexity from light count: O(geometry) + O(lights) instead of O(geometry × lights).

6. Uniform Buffer Objects (UBO)

WebGL1 requires one gl.uniform*() call per uniform per program. When switching between many materials or draw calls this creates CPU driver overhead. UBOs pack a block of uniforms into a GPU buffer, shared across multiple shader programs via a binding point:

// GLSL 300 es — declare UBO
layout(std140) uniform CameraUniforms {
  mat4 u_view;
  mat4 u_projection;
  vec3 u_cameraPos;
  float u_near;
  float u_far;
};

// JS — create and bind UBO
const ubo = gl.createBuffer();
gl.bindBuffer(gl.UNIFORM_BUFFER, ubo);
gl.bufferData(gl.UNIFORM_BUFFER, cameraData, gl.DYNAMIC_DRAW);
const blockIdx = gl.getUniformBlockIndex(program, 'CameraUniforms');
gl.uniformBlockBinding(program, blockIdx, 0);  // binding point 0
gl.bindBufferBase(gl.UNIFORM_BUFFER, 0, ubo);

// Update once per frame, all programs sharing binding 0 see new data
gl.bufferSubData(gl.UNIFORM_BUFFER, 0, newCameraData);

7. Practical Performance Tips

Bottleneck	Diagnosis	Fix
Too many draw calls	CPU-bound; GPU idle >50%	Merge geometry; use instancing; VAO batching
Buffer readback (readPixels)	Stalls GPU pipeline; huge frame time spike	Use Transform Feedback or Pixel Pack Buffer (PBO) with 2-frame delay
Uniform overhead	Many gl.uniform*() calls	Switch to UBO; pack related data together
Texture format	RGB not power-of-two, no mipmaps	Use RGBA (aligned), generate mipmaps, prefer gl.NEAREST for float data
Fragment shader ALU bound	Long fragment shader; overdraw	Precompute to textures; reduce overdraw; early-z
TF overhead	Rasterizer discard not set	Always set RASTERIZER_DISCARD during TF update passes

8. Interactive: Transform Feedback Particle System

This demo runs a 100 000-particle gravity fountain entirely on the GPU using Transform Feedback ping-pong buffers in WebGL2. Each particle has position (xyz), velocity (xyz), and age. The shader updates physics (gravity, bounce, respawn) and the render shader draws each particle as a point sprite coloured by age. No CPU-side particle arrays.

Gravity: −9.81 Lifespan: 4.0s FPS: — Particles: 100 000

If your browser does not support WebGL2 Transform Feedback, a JavaScript fallback will render a reduced 10 000-particle CPU simulation instead.

← Real-Time Denoising WebGPU Compute Shaders →

WebGL2 Advanced: VAO, Transform Feedback & GPGPU Ping-Pong