WebGL2 (shipped in all major browsers since 2017, backed by OpenGL ES 3.0) introduced several powerful features missing from WebGL1: Vertex Array Objects (VAO), Transform Feedback, multiple render targets (MRT), integer textures, instanced rendering, and floating-point framebuffers. Together these enable production-quality GPGPU computation entirely in the browser without WebGPU. This article covers the three most important advanced patterns.
1. WebGL2 vs WebGL1 — Feature Delta
| Feature | WebGL1 | WebGL2 |
|---|---|---|
| Vertex Array Objects (VAO) | Extension only (OES_vertex_array_object) | Core |
| Transform Feedback | Not available | Core |
| Float textures (RGBA32F rendering) | Extension (not renderable) | EXT_color_buffer_float (renderable) |
| Integer textures | No | RGBA8UI, RGBA32I, etc. |
| Multiple Render Targets (MRT) | WEBGL_draw_buffers extension | Core (drawBuffers) |
| Instanced rendering | ANGLE_instanced_arrays extension | Core |
| Uniform Buffer Objects (UBO) | No | Core |
| 3D textures | No | Core (TEXTURE_3D) |
| GLSL version | #version 100 (ES 1.0) | #version 300 es (ES 3.0) |
2. Vertex Array Objects (VAO)
In WebGL1, every draw call requires rebinding all vertex attribute
pointers: bindBuffer, vertexAttribPointer,
enableVertexAttribArray for each attribute. VAOs
encapsulate all this state once and replay it with a single
gl.bindVertexArray(vao).
// WebGL2 — create and populate a VAO once
const vao = gl.createVertexArray();
gl.bindVertexArray(vao);
// positions
const posBuf = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, posBuf);
gl.bufferData(gl.ARRAY_BUFFER, positions, gl.DYNAMIC_COPY);
gl.enableVertexAttribArray(0);
gl.vertexAttribPointer(0, 3, gl.FLOAT, false, 0, 0);
// velocities
const velBuf = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, velBuf);
gl.bufferData(gl.ARRAY_BUFFER, velocities, gl.DYNAMIC_COPY);
gl.enableVertexAttribArray(1);
gl.vertexAttribPointer(1, 3, gl.FLOAT, false, 0, 0);
gl.bindVertexArray(null);
// Later, per frame:
gl.bindVertexArray(vao);
gl.drawArrays(gl.POINTS, 0, N_PARTICLES);
gl.bindVertexArray(null);
VAOs are especially important for scenes with many different meshes —
each mesh gets its own VAO. Switching from mesh A to mesh B costs one
bindVertexArray instead of 10–20 state changes.
3. Transform Feedback: GPU Particle Systems
Transform Feedback allows the output of the vertex shader to be captured back into a GPU buffer, bypassing rasterisation. This enables GPU-side computation loops without ever reading data back to the CPU:
Setup skeleton
// 1. Declare TF varyings BEFORE linking the program
gl.transformFeedbackVaryings(
updateProgram,
['v_position', 'v_velocity', 'v_age'],
gl.SEPARATE_ATTRIBS // or INTERLEAVED_ATTRIBS
);
gl.linkProgram(updateProgram);
// 2. Create transform feedback object
const tf = gl.createTransformFeedback();
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, tf);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 0, bufB_pos);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 1, bufB_vel);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 2, bufB_age);
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, null);
// 3. Per frame — physics update pass (no rasterisation)
gl.useProgram(updateProgram);
gl.bindVertexArray(vaoA); // read from buffer A
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, tf); // write to B
gl.enable(gl.RASTERIZER_DISCARD); // skip fragment shader
gl.beginTransformFeedback(gl.POINTS);
gl.drawArrays(gl.POINTS, 0, N);
gl.endTransformFeedback();
gl.disable(gl.RASTERIZER_DISCARD);
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, null);
// 4. Render pass — draw buffer B
gl.useProgram(renderProgram);
gl.bindVertexArray(vaoB);
gl.drawArrays(gl.POINTS, 0, N);
// 5. Swap A ↔ B for next frame
The vertex shader for the physics update receives the current position, velocity, and age as attributes, and outputs the next state as varyings captured by TF. No CPU-GPU data transfer — the entire simulation runs at GPU memory bandwidth (hundreds of GB/s).
GLSL 300 es physics vertex shader (excerpt)
#version 300 es
precision highp float;
layout(location=0) in vec3 a_position;
layout(location=1) in vec3 a_velocity;
layout(location=2) in float a_age;
out vec3 v_position;
out vec3 v_velocity;
out float v_age;
uniform float u_dt;
uniform vec3 u_gravity;
uniform float u_lifespan;
void main() {
vec3 vel = a_velocity + u_gravity * u_dt;
vec3 pos = a_position + vel * u_dt;
// respawn dead particles at origin with random velocity
float age = a_age + u_dt;
if (age > u_lifespan) {
// use gl_VertexID as seed for pseudo-random respawn
float s = sin(float(gl_VertexID) * 127.1 + age * 311.7);
vel = normalize(vec3(s, fract(s*1.61), fract(s*2.72))) * 2.0;
pos = vec3(0.0);
age = 0.0;
}
// simple ground bounce
if (pos.y < 0.0) { pos.y = 0.0; vel.y = abs(vel.y) * 0.6; }
v_position = pos;
v_velocity = vel;
v_age = age;
}
4. GPGPU Ping-Pong Framebuffers
For 2-D simulations (fluid, reaction-diffusion, game of life, wave equations), the preferred GPGPU pattern is the ping-pong texture: two framebuffers each backed by a float texture. Each frame reads from "A" and writes to "B", then swaps.
// Setup — RGBA32F textures (requires EXT_color_buffer_float)
const ext = gl.getExtension('EXT_color_buffer_float');
if (!ext) throw new Error('EXT_color_buffer_float not available');
function makeFloatFBO(w, h) {
const tex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, tex);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA32F, w, h, 0,
gl.RGBA, gl.FLOAT, null);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
const fbo = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0,
gl.TEXTURE_2D, tex, 0);
return { fbo, tex };
}
let [A, B] = [makeFloatFBO(W, H), makeFloatFBO(W, H)];
// Per frame:
gl.bindFramebuffer(gl.FRAMEBUFFER, B.fbo);
gl.bindTexture(gl.TEXTURE_2D, A.tex);
// draw fullscreen quad with simulation shader
[A, B] = [B, A]; // swap
The simulation kernel runs entirely in the fragment shader: each texel
computes its new state from the texel and its neighbours (via
texture(uState, uv + offset)). This approach powers WebGL
fluid simulations, reaction-diffusion, CA, and custom neural network
inference in the browser.
5. Multiple Render Targets (MRT)
A single render pass can write to multiple textures simultaneously using
gl.drawBuffers([gl.COLOR_ATTACHMENT0, gl.COLOR_ATTACHMENT1, …])
with multiple layout(location=N) fragment shader outputs:
// Fragment shader with MRT
#version 300 es
precision highp float;
layout(location=0) out vec4 out_position; // G-Buffer position
layout(location=1) out vec4 out_normal; // G-Buffer normal
layout(location=2) out vec4 out_albedo; // G-Buffer albedo
void main() {
out_position = vec4(v_worldPos, gl_FragDepth);
out_normal = vec4(normalize(v_normal) * 0.5 + 0.5, 1.0);
out_albedo = texture(u_albedoTex, v_uv);
}
MRT is the foundation of deferred shading: fill a G-Buffer in one pass (expensive vertex + material evaluation), then apply all lights in a separate pass sampling the G-Buffer. This decouples scene complexity from light count: O(geometry) + O(lights) instead of O(geometry × lights).
6. Uniform Buffer Objects (UBO)
WebGL1 requires one gl.uniform*() call per uniform per
program. When switching between many materials or draw calls this
creates CPU driver overhead. UBOs pack a block of
uniforms into a GPU buffer, shared across multiple shader programs via a
binding point:
// GLSL 300 es — declare UBO
layout(std140) uniform CameraUniforms {
mat4 u_view;
mat4 u_projection;
vec3 u_cameraPos;
float u_near;
float u_far;
};
// JS — create and bind UBO
const ubo = gl.createBuffer();
gl.bindBuffer(gl.UNIFORM_BUFFER, ubo);
gl.bufferData(gl.UNIFORM_BUFFER, cameraData, gl.DYNAMIC_DRAW);
const blockIdx = gl.getUniformBlockIndex(program, 'CameraUniforms');
gl.uniformBlockBinding(program, blockIdx, 0); // binding point 0
gl.bindBufferBase(gl.UNIFORM_BUFFER, 0, ubo);
// Update once per frame, all programs sharing binding 0 see new data
gl.bufferSubData(gl.UNIFORM_BUFFER, 0, newCameraData);
7. Practical Performance Tips
| Bottleneck | Diagnosis | Fix |
|---|---|---|
| Too many draw calls | CPU-bound; GPU idle >50% | Merge geometry; use instancing; VAO batching |
| Buffer readback (readPixels) | Stalls GPU pipeline; huge frame time spike | Use Transform Feedback or Pixel Pack Buffer (PBO) with 2-frame delay |
| Uniform overhead | Many gl.uniform*() calls | Switch to UBO; pack related data together |
| Texture format | RGB not power-of-two, no mipmaps | Use RGBA (aligned), generate mipmaps, prefer gl.NEAREST for float data |
| Fragment shader ALU bound | Long fragment shader; overdraw | Precompute to textures; reduce overdraw; early-z |
| TF overhead | Rasterizer discard not set | Always set RASTERIZER_DISCARD during TF update passes |
8. Interactive: Transform Feedback Particle System
This demo runs a 100 000-particle gravity fountain entirely on the GPU using Transform Feedback ping-pong buffers in WebGL2. Each particle has position (xyz), velocity (xyz), and age. The shader updates physics (gravity, bounce, respawn) and the render shader draws each particle as a point sprite coloured by age. No CPU-side particle arrays.
If your browser does not support WebGL2 Transform Feedback, a JavaScript fallback will render a reduced 10 000-particle CPU simulation instead.