BVH: Bounding Volume Hierarchy for Ray Tracing & Collision Detection
Rendering a scene with millions of triangles via brute-force ray-triangle intersection would take minutes per frame. A BVH reduces that to milliseconds — by organising geometry into a tree of nested bounding boxes and skipping entire subtrees the moment a ray misses their box. The same data structure powers real-time collision detection in game engines and physically-based path tracers alike.
1. The Acceleration Problem
A path tracer shoots millions of rays per frame. Each ray must find the nearest intersection with scene geometry. Without acceleration, testing one ray against N triangles costs O(N). With N = 10 million triangles and 4 million rays per frame at 60 fps, that is 2.4 × 10¹⁵ triangle tests per second — roughly 10⁶× beyond current hardware capability.
Acceleration structures solve this by culling large parts of the scene before per-triangle tests. A BVH reduces expected ray cost from O(N) to O(log N) for uniformly distributed scenes — a 10-million-triangle scene requires ~23 bounding box tests plus a handful of triangle tests per ray, rather than 10 million.
2. Axis-Aligned Bounding Boxes
Each node in a BVH contains an AABB (Axis-Aligned Bounding Box) — the smallest box with faces parallel to the coordinate planes that encloses all primitives in that node's subtree.
Ray-AABB intersection (slab method)
Testing a ray against an AABB is extremely cheap — 6 divisions and a few comparisons — making it the fast-reject step before expensive triangle tests:
3. BVH Construction: Strategies
A BVH is a binary tree. Each internal node holds an AABB and pointers to two children. Leaf nodes hold one or more primitives. The key question is: how do we partition a set of primitives into two groups to form two child nodes?
Centroid Median Split
The simplest approach: compute the centroids of all primitives in the current node. Split along the axis with the greatest centroid spread, at the median centroid value. Each primitive goes to the child whose bounding box contains its centroid.
Object Partitioning vs Spatial Partitioning
Object partitioning (what BVH does) assigns each primitive to exactly one node. Spatial partitioning (kd-trees) clips primitives at the split plane — a triangle spanning the boundary goes to both children. Object partitioning is simpler and avoids primitive duplication; spatial partitioning can be tighter for large triangles.
4. Surface Area Heuristic (SAH)
The Surface Area Heuristic chooses the split that minimises the expected cost of a random ray hitting the node. It uses the geometric fact that the probability of a random ray hitting a convex object is proportional to its surface area (for uniformly distributed rays in a bounding sphere).
Binned SAH (Wald 2007) is the standard in production renderers. It approximates the exact SAH optimum (which requires O(N log N) per level) at nearly the same quality using only K=32 bins. Embree, Optix, and virtually all production path tracers use binned SAH or variants of it.
Leaf Termination
Recursion stops when a node has ≤ N_leaf primitives (typically 1–4 triangles). Smaller leaves improve traversal quality (fewer false positives) but increase node count and memory. Setting N_leaf = 4 and using 4-wide SIMD to test all 4 triangles simultaneously is a common optimisation in CPU renderers (packet tracing).
5. Ray-BVH Traversal
Given a ray and a BVH root, find the nearest intersection. The standard algorithm uses a stack to avoid recursion (GPU-friendly):
The near-child-first ordering is critical for performance: by visiting the nearer child first, tBest is updated to a smaller value sooner, allowing farther child AABBs to be culled more aggressively ("early exit with tighter t").
Packet and SIMD Traversal
CPU renderers test 4 or 8 rays in parallel (a "ray packet") using SSE/AVX SIMD. All rays in a packet share the same traversal stack as long as they follow the same path through the BVH — "frustum culling" at each node using the packet's shared bounding frustum. Divergent packets fall back to individual traversal. This gives 2–4× speedup on coherent rays (primary rays from a camera) but less benefit on scattered secondary rays.
6. GPU BVH: LBVH with Morton Codes
CPU SAH builds (top-down, sequential) are too slow for dynamic scenes that need rebuild every frame. Linear BVH (LBVH) builds the tree bottom-up from a Morton code sort in ~5–20 ms on a GPU — fast enough for real-time skeletal animation or particle systems.
LBVH quality is lower than SAH — Morton-ordered trees can have poor splits when primitives cluster — but is 20–50× faster to build. HLBVH (Hierarchical LBVH) uses SAH splitting for the top levels of the tree and LBVH for subtrees, combining build speed with good traversal quality.
7. TLAS / BLAS in Modern RT APIs
DirectX Raytracing (DXR), Vulkan Ray Tracing, and Metal RT use a two-level BVH structure that separates geometric detail from instance transforms:
In DXR, BuildRaytracingAccelerationStructure() takes a
D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS
descriptor specifying geometry lists (for BLAS) or instance
descriptors (for TLAS). The driver handles all internal BVH format
details, Morton sorting, and SAH — the application only provides
geometry and instances.
Update vs Rebuild
A BLAS can be updated (refit without structural change) much faster than rebuilt from scratch. Update is correct when vertices move slightly (animation), but quality degrades as motion increases. The heuristic: use update if vertex displacement is < 10% of primitive size; otherwise rebuild. RTX 4090 refit: ~0.5 ms for 1M triangles; rebuild: ~5 ms.
8. BVH for Collision Detection
Game engines use AABB trees (a variant of BVH) for broad-phase collision detection — quickly finding pairs of objects that might be colliding before running expensive narrow-phase geometry tests (GJK, EPA).
Dynamic BVH (DBVH)
For game physics, objects move every frame. Rebuilding the full BVH each frame is wasteful — instead, a dynamic BVH uses fat AABBs (extended by a "margin" in each direction) and lazy refitting:
-
Each moving object expands its AABB by a velocity-proportional
margin:
fat_aabb = aabb.expand(v × Δt × 2) - The tree is only restructured for an object when its AABB grows outside its stored fat AABB.
- Restructuring uses tree rotations (similar to AVL rotations) to maintain balance.
Bullet Physics and Box2D use a dynamic AABB tree (incrementally maintained) for broad-phase. Update cost per moved object is O(log N) amortised, much better than full rebuilds.
Bounding Volume Choices Beyond AABB
- Spheres: Simplest intersection test, but poor fit for thin triangles. Used in early game engines and for dynamic objects.
- OBB (Oriented Bounding Box): Tightest fit, 16 separating axis tests for OBB-OBB. Better quality but 5–10× slower intersection test.
- k-DOP (Discrete Oriented Polytope): Convex polytope with k/2 pairs of parallel faces at fixed orientations. 18-DOP (k=18) gives excellent fit for most models with only modest extra test cost. Used in Unreal Engine's collision system.