Linear Algebra for 3D Graphics: Vectors, Matrices, Quaternions

1. Vectors: Points, Directions, and Operations

A vector in 3D graphics is a tuple (x, y, z) that represents either a point in space or a direction with magnitude. The distinction matters: translating a direction does not change the direction, but translating a point changes its position.

Essential Operations

// Dot product: scalar result a \cdot b = aₓbₓ + a_y b_y + a_z b_z = |a||b|cosθ Uses: - Angle between vectors: θ = arccos(a\cdotb / |a||b|) - Project a onto b: proj = (a\cdotb̂)\cdotb̂ where b̂ = b/|b| - Diffuse lighting: N\cdotL = cos(angle to light) for Lambert shading - Backface culling: if (N\cdotV < 0) skip triangle // Cross product: vector result (perpendicular to both inputs) a \times b = (a_y b_z - a_z b_y, a_z bₓ - aₓ b_z, aₓ b_y - a_y bₓ) |a \times b| = |a||b|sinθ Uses: - Triangle normal: N = (B-A) \times (C-A), then normalize - Orthonormal basis construction (e.g. camera frame) - Torque: τ = r \times F in physics simulation - Handedness check: (a\timesb)\cdotc determines coordinate system orientation // Normalization: make unit length â = a / |a| where |a| = sqrt(aₓ² + a_y² + a_z²) Note: normalizing a zero vector is undefined (NaN guard needed)

Convention clash: OpenGL and WebGL use column vectors with right-multiplication (M·v). DirectX and HLSL use row vectors with left-multiplication (v·M). The same geometric transform requires the transpose of the matrix between the two conventions. GLM (C++) uses OpenGL convention; glMatrix (JavaScript) also uses column vectors with column-major storage (same array layout as OpenGL).

2. Matrices and Transform Types

A 3×3 matrix encodes any linear transform: rotation, scale, shear. A 4×4 matrix encodes any affine transform in 3D: linear transforms plus translation (via homogeneous coordinates). GPU shaders and all 3D APIs work with 4×4 matrices.

Rotation Matrices (3×3)

// Rotation about X axis by angle θ Rₓ(θ) = | 1 0 0 | | 0 cosθ -sinθ | | 0 sinθ cosθ | // Rotation about Y axis by angle θ R_y(θ) = | cosθ 0 sinθ | | 0 1 0 | | -sinθ 0 cosθ | // Rotation about Z axis by angle θ R_z(θ) = | cosθ -sinθ 0 | | sinθ cosθ 0 | | 0 0 1 | // Rotation about arbitrary unit axis â by angle θ (Rodrigues) R = cosθ\cdotI + (1-cosθ)\cdotâaᵀ + sinθ\cdot[â]\times where [â]\times is the skew-symmetric cross product matrix of â

Scale & Translation

// Non-uniform scale (4\times4 homogeneous) S(sₓ, s_y, s_z) = | sₓ 0 0 0 | | 0 s_y 0 0 | | 0 0 s_z 0 | | 0 0 0 1 | // Translation (4\times4 homogeneous) T(tₓ, t_y, t_z) = | 1 0 0 tₓ | | 0 1 0 t_y | | 0 0 1 t_z | | 0 0 0 1 |

Combining Transforms: Order Matters

Matrix multiplication is associative but not commutative: Rotate then Translate ≠ Translate then Rotate. The convention is that the rightmost matrix is applied first to the column vector:

// Apply: first scale, then rotate, then translate M = T \cdot R \cdot S v_world = M \cdot v_local = T \cdot (R \cdot (S \cdot v_local)) ↑ last applied ↑ second ↑ first applied Mistake: M = S \cdot R \cdot T would translate first, then rotate, placing the object in a completely different location.

3. Homogeneous Coordinates

3D translation cannot be represented as a 3×3 matrix multiplication — translation is not a linear transform (it does not preserve the origin). The elegant solution: lift 3D coordinates to 4D by adding a fourth component w.

// 3D point \to homogeneous: (x, y, z) \to (x, y, z, 1) // 3D vector/direction: (x, y, z) \to (x, y, z, 0) The w component distinguishes points (w=1) from directions (w=0): T \cdot (x, y, z, 1)ᵀ = (x+tₓ, y+t_y, z+t_z, 1) ✓ translated T \cdot (x, y, z, 0)ᵀ = (x, y, z, 0) ✓ direction unchanged! // Converting back to 3D (perspective divide): (X, Y, Z, W) \to (X/W, Y/W, Z/W) for W \neq 0 After the perspective projection matrix, W \neq 1 in general. The GPU performs the perspective divide automatically at the end of the vertex shader stage (clip space \to NDC).

Homogeneous point at infinity: (x, y, z, 0) represents the direction (x, y, z) as a "point at infinity" — the limit of (tx, ty, tz, 1) as t→∞. Parallel lines in projective geometry meet at a point at infinity, which is why railroad tracks converge at the vanishing point in a perspective rendering.

4. The MVP Transform Pipeline

A vertex travels through three coordinate spaces before becoming a pixel. Each space is connected by a matrix multiply:

// Full vertex transform chain v_clip = P \cdot V \cdot M \cdot v_local M = Model matrix (local space \to world space) V = View matrix (world space \to camera/eye space) P = Projection matrix (eye space \to clip space) // Model matrix: position + orient + scale the object M = T_world \cdot R_world \cdot S_local // View matrix: inverse of the camera's world transform // Camera is at pos, looking at target, up vector is up: V = lookAt(pos, target, up) = inverse(T_pos \cdot R_camera) // Perspective projection matrix (OpenGL convention, NDC: [-1,1]) P = | 2n/(r-l) 0 (r+l)/(r-l) 0 | | 0 2n/(t-b) (t+b)/(t-b) 0 | | 0 0 -(f+n)/(f-n) -2fn/(f-n)| | 0 0 -1 0 | Simplified for symmetric frustum (l=-r, b=-t): fovY = vertical FOV in radians aspect = width/height P[0][0] = 1/(aspect\cdottan(fovY/2)) P[1][1] = 1/tan(fovY/2) P[2][2] = -(f+n)/(f-n) P[2][3] = -2fn/(f-n) P[3][2] = -1

After P·V·M·v, the result is in clip space. The GPU clips triangles to the clip cube, then performs the perspective divide (÷w) to reach NDC (Normalized Device Coordinates, x∈[−1,1], y∈[−1,1], z∈[−1,1] for OpenGL). Finally, the viewport transform maps NDC to pixel coordinates:

// Viewport transform (NDC \to window pixels) x_px = (x_ndc + 1) / 2 \cdot viewport_width + x_origin y_px = (y_ndc + 1) / 2 \cdot viewport_height + y_origin // Note: y may be flipped depending on API (D3D flips, OpenGL doesn't)

5. Quaternions for Rotation

Euler angles (yaw, pitch, roll) are intuitive but suffer from gimbal lock: when two rotation axes become aligned, a degree of freedom is lost and smooth interpolation breaks. Rotation matrices work but use 9 floats for only 3 degrees of freedom and accumulate floating-point drift over many composing operations. Quaternions solve both problems using just 4 floats.

A quaternion q = (w, x, y, z) = w + xi + yj + zk, where i, j, k are imaginary units satisfying i² = j² = k² = ijk = −1. A unit quaternion (|q|=1) represents a 3D rotation:

// Quaternion encoding rotation by angle θ about unit axis â=(ax,ay,az): q = (cos(θ/2), ax\cdotsin(θ/2), ay\cdotsin(θ/2), az\cdotsin(θ/2)) = (w, x, y, z) // Applying rotation to vector v (as pure quaternion v=(0,v)): v_rotated = q \cdot v \cdot q* where q* = (w, -x, -y, -z) (conjugate = inverse for unit q) // Equivalent matrix form (row-major used here for display): R = | 1-2(y²+z²) 2(xy-wz) 2(xz+wy) | | 2(xy+wz) 1-2(x²+z²) 2(yz-wx) | | 2(xz-wy) 2(yz+wx) 1-2(x²+y²)| // Composing two rotations: quaternion multiplication q_total = q₂ \cdot q₁ (apply q₁ first, then q₂) w = w₂w₁ - x₂x₁ - y₂y₁ - z₂z₁ x = w₂x₁ + x₂w₁ + y₂z₁ - z₂y₁ y = w₂y₁ - x₂z₁ + y₂w₁ + z₂x₁ z = w₂z₁ + x₂y₁ - y₂x₁ + z₂w₁

6. Interpolation and Smooth Rotation

For animation, we need smooth transitions between orientations. Linearly interpolating matrix entries or Euler angles gives non-constant angular velocity and poor results. Quaternion interpolation is the standard solution.

LERP (Linear Interpolation)

// Naïve quaternion lerp — fast but not constant speed q(t) = normalize((1-t)\cdotq₁ + t\cdotq₂) for t \in [0, 1] Use when: speed difference is imperceptible (t range small), or performance is critical (LERP is one add + normalize)

SLERP (Spherical Linear Interpolation)

// Slerp — constant angular velocity along great circle on unit 4-sphere Ω = arccos(q₁ \cdot q₂) (dot product of quaternion components) q(t) = [sin((1-t)Ω) / sinΩ]\cdotq₁ + [sin(tΩ) / sinΩ]\cdotq₂ Edge cases: - If q₁\cdotq₂ < 0: negate q₂ before slerp (short path vs long path) - If Ω \approx 0: fall back to LERP (quaternions nearly identical) Slerp is the gold standard for animation blending — it preserves constant angular velocity and shortest path.

Gimbal lock explained: With Euler angles (e.g. Z·Y·X order), if the Y rotation is ±90°, the X and Z axes become parallel — they both rotate around the same world axis. One degree of freedom is lost. No combination of X and Z rotation can then produce rotation around the lost axis. Quaternions have no such degeneracy because they operate directly on the 4D unit sphere.

Squad and Higher-Order Splines

For smooth paths through multiple keyframe orientations, SLERP alone creates C¹ discontinuities at keyframes (velocity is continuous but angular acceleration is not). Squad (Spherical Cubic Interpolation) extends Bézier curves to quaternion space, producing C² smooth rotation paths used in game camera systems and cinematic animation.

7. Normal Transforms and the Normal Matrix

Surface normals are direction vectors and transform differently from points under non-uniform scaling. If you transform a normal by the same matrix M as the vertices, non-uniform scale will distort the normals and break lighting calculations.

// Wrong: N_world = M \cdot N_local (breaks for non-uniform scale) // Correct: N_world = transpose(inverse(M)) \cdot N_local The "normal matrix" = transpose(inverse(M)) Often precomputed on the CPU and passed as a uniform: uniform mat3 normalMatrix; // in GLSL For orthogonal matrices (rotation only, no scale): inverse(M) = transpose(M) \to normal matrix = M itself (so pure rotation transforms normals correctly with the same matrix) Proof sketch: If t is a tangent (Mv is a transformed point), normal N must stay perpendicular: N\cdott = 0. After transform: (AN)\cdot(Mt) = 0 \to (AN)ᵀ(Mt) = 0 \to Nᵀ(AᵀM)t = 0 \to AᵀM = I \to A = (M⁻¹)ᵀ = transpose(inverse(M)) ✓

8. Linear Algebra on the GPU: GLSL & WGSL

GPU shader languages have first-class linear algebra types. The GPU executes vector and matrix operations in single instructions across all lanes of a SIMD processor — what takes a loop in CPU code is a single op on the GPU.

GLSL (WebGL / OpenGL)

// GLSL built-in types vec2, vec3, vec4 // float vectors mat2, mat3, mat4 // float matrices (column-major) ivec3, uvec3, bvec3 // integer, unsigned, bool vectors // Common operations (all built-in, hardware-accelerated) float d = dot(a, b); // dot product vec3 n = cross(a, b); // cross product vec3 u = normalize(v); // normalise (unit length) float l = length(v); // Euclidean length vec3 r = reflect(d, n); // reflection vector vec3 rf = refract(d, n, eta); // refraction vector float f = mix(a, b, t); // linear interpolation (LERP) vec3 c = clamp(v, 0.0, 1.0); // component-wise clamp // Matrix-vector multiplication (column vector post-multiplication) vec4 v_clip = projMatrix * viewMatrix * modelMatrix * v_local; // Accessing columns (mat4 is array of 4 column vec4s) vec4 col0 = myMat[0]; // first column float m23 = myMat[2][3];// row 3, col 2 (column-major: col 2, row 3)

WGSL (WebGPU)

// WGSL (WebGPU Shading Language) — similar but stricter var v: vec3f = vec3f(1.0, 0.0, 0.0); var m: mat4x4f = /* ... */; let d: f32 = dot(a, b); let n: vec3f = normalize(cross(b - a, c - a)); // Matrix multiply: same notation let clip_pos: vec4f = uniforms.mvp * vertex_pos; // Key difference from GLSL: explicit types everywhere, // no implicit conversions (1 instead of 1.0 is a compile error)

Performance note: On modern GPUs, a mat4×vec4 multiply is a single hardware instruction executing 16 multiply-accumulate operations simultaneously. Transforming a batch of one million vertices takes a few milliseconds on a mid-range GPU — the bottleneck is usually memory bandwidth (fetching the vertex data), not the arithmetic.

Practical Tips for 3D Programming

Always normalise after SLERP/quaternion operations — floating-point drift accumulates and eventually makes |q| ≠ 1, causing subtle scaling artifacts in rotation matrices.
Use glMatrix (JavaScript) or GLM (C++) for production work — hand-coded matrix math has subtle column/row major bugs that cost hours to debug.
Watch the multiplication order carefully — glMatrix functions like mat4.multiply(out, a, b) compute a·b, meaning b is applied first. Many bugs come from reversing this order.
Avoid inverting matrices on the GPU — single-precision float inverse of a nearly-singular matrix loses accuracy fast. Compute inverses on the CPU and upload as uniforms.
Depth precision — the perspective projection compresses depth non-linearly, putting most precision near the near plane. Use a logarithmic depth buffer or reversed-Z (near=1, far=0) for large-scale scenes.