Linear Algebra for 3D Graphics: Vectors, Matrices, Quaternions
Every triangle rendered on screen has passed through at least four
matrix multiplications before it reaches your eyes. The Model matrix
places it in the world, the View matrix moves the world in front of
the camera, the Projection matrix applies perspective, and the
Viewport matrix maps to pixel coordinates. Understanding the linear
algebra behind these transforms is the foundation of all 3D
programming — from shader code to physics engines to machine learning
with 3D data.
1. Vectors: Points, Directions, and Operations
A vector in 3D graphics is a tuple (x, y, z) that represents either a
point in space or a direction with
magnitude. The distinction matters: translating a direction does not
change the direction, but translating a point changes its position.
Essential Operations
// Dot product: scalar result a · b = aₓbₓ + a_y b_y + a_z b_z =
|a||b|cosθ Uses: - Angle between vectors: θ = arccos(a·b / |a||b|) -
Project a onto b: proj = (a·b̂)·b̂ where b̂ = b/|b| - Diffuse lighting:
N·L = cos(angle to light) for Lambert shading - Backface culling: if
(N·V < 0) skip triangle // Cross product: vector result (perpendicular
to both inputs) a × b = (a_y b_z − a_z b_y, a_z bₓ − aₓ b_z, aₓ b_y −
a_y bₓ) |a × b| = |a||b|sinθ Uses: - Triangle normal: N = (B−A) ×
(C−A), then normalize - Orthonormal basis construction (e.g. camera
frame) - Torque: τ = r × F in physics simulation - Handedness check:
(a×b)·c determines coordinate system orientation // Normalization:
make unit length â = a / |a| where |a| = sqrt(aₓ² + a_y² + a_z²) Note:
normalizing a zero vector is undefined (NaN guard needed)
Convention clash: OpenGL and WebGL use
column vectors with right-multiplication (M·v).
DirectX and HLSL use row vectors with
left-multiplication (v·M). The same geometric transform requires the
transpose of the matrix between the two conventions.
GLM (C++) uses OpenGL convention; glMatrix (JavaScript) also uses
column vectors with column-major storage (same array layout as
OpenGL).
2. Matrices and Transform Types
A 3×3 matrix encodes any linear transform: rotation, scale, shear. A
4×4 matrix encodes any affine transform in 3D: linear transforms plus
translation (via homogeneous coordinates). GPU shaders and all 3D APIs
work with 4×4 matrices.
Rotation Matrices (3×3)
// Rotation about X axis by angle θ Rₓ(θ) = | 1 0 0 | | 0 cosθ −sinθ |
| 0 sinθ cosθ | // Rotation about Y axis by angle θ R_y(θ) = | cosθ 0
sinθ | | 0 1 0 | | −sinθ 0 cosθ | // Rotation about Z axis by angle θ
R_z(θ) = | cosθ −sinθ 0 | | sinθ cosθ 0 | | 0 0 1 | // Rotation about
arbitrary unit axis â by angle θ (Rodrigues) R = cosθ·I + (1−cosθ)·âaᵀ
+ sinθ·[â]× where [â]× is the skew-symmetric cross product matrix of â
Matrix multiplication is associative but not commutative:
Rotate then Translate ≠ Translate then Rotate. The convention is that
the rightmost matrix is applied first to the column vector:
// Apply: first scale, then rotate, then translate M = T · R · S
v_world = M · v_local = T · (R · (S · v_local)) ↑ last applied ↑
second ↑ first applied Mistake: M = S · R · T would translate first,
then rotate, placing the object in a completely different location.
3. Homogeneous Coordinates
3D translation cannot be represented as a 3×3 matrix multiplication —
translation is not a linear transform (it does not preserve the
origin). The elegant solution: lift 3D coordinates to 4D by adding a
fourth component w.
// 3D point → homogeneous: (x, y, z) → (x, y, z, 1) // 3D
vector/direction: (x, y, z) → (x, y, z, 0) The w component
distinguishes points (w=1) from directions (w=0): T · (x, y, z, 1)ᵀ =
(x+tₓ, y+t_y, z+t_z, 1) ✓ translated T · (x, y, z, 0)ᵀ = (x, y, z, 0)
✓ direction unchanged! // Converting back to 3D (perspective divide):
(X, Y, Z, W) → (X/W, Y/W, Z/W) for W ≠ 0 After the perspective
projection matrix, W ≠ 1 in general. The GPU performs the perspective
divide automatically at the end of the vertex shader stage (clip space
→ NDC).
Homogeneous point at infinity: (x, y, z, 0)
represents the direction (x, y, z) as a "point at infinity" — the
limit of (tx, ty, tz, 1) as t→∞. Parallel lines in projective geometry
meet at a point at infinity, which is why railroad tracks converge at
the vanishing point in a perspective rendering.
4. The MVP Transform Pipeline
A vertex travels through three coordinate spaces before becoming a
pixel. Each space is connected by a matrix multiply:
// Full vertex transform chain v_clip = P · V · M · v_local M = Model
matrix (local space → world space) V = View matrix (world space →
camera/eye space) P = Projection matrix (eye space → clip space) //
Model matrix: position + orient + scale the object M = T_world ·
R_world · S_local // View matrix: inverse of the camera's world
transform // Camera is at pos, looking at target, up vector is up: V =
lookAt(pos, target, up) = inverse(T_pos · R_camera) // Perspective
projection matrix (OpenGL convention, NDC: [-1,1]) P = | 2n/(r−l) 0
(r+l)/(r−l) 0 | | 0 2n/(t−b) (t+b)/(t−b) 0 | | 0 0 −(f+n)/(f−n)
−2fn/(f−n)| | 0 0 −1 0 | Simplified for symmetric frustum (l=−r,
b=−t): fovY = vertical FOV in radians aspect = width/height P[0][0] =
1/(aspect·tan(fovY/2)) P[1][1] = 1/tan(fovY/2) P[2][2] = −(f+n)/(f−n)
P[2][3] = −2fn/(f−n) P[3][2] = −1
After P·V·M·v, the result is in clip space. The GPU
clips triangles to the clip cube, then performs the perspective divide
(÷w) to reach NDC (Normalized Device Coordinates,
x∈[−1,1], y∈[−1,1], z∈[−1,1] for OpenGL). Finally, the viewport
transform maps NDC to pixel coordinates:
// Viewport transform (NDC → window pixels) x_px = (x_ndc + 1) / 2 ·
viewport_width + x_origin y_px = (y_ndc + 1) / 2 · viewport_height +
y_origin // Note: y may be flipped depending on API (D3D flips, OpenGL
doesn't)
5. Quaternions for Rotation
Euler angles (yaw, pitch, roll) are intuitive but suffer from
gimbal lock: when two rotation axes become aligned, a
degree of freedom is lost and smooth interpolation breaks. Rotation
matrices work but use 9 floats for only 3 degrees of freedom and
accumulate floating-point drift over many composing operations.
Quaternions solve both problems using just 4 floats.
A quaternion q = (w, x, y, z) = w + xi + yj + zk, where i, j, k are
imaginary units satisfying i² = j² = k² = ijk = −1. A
unit quaternion (|q|=1) represents a 3D rotation:
// Quaternion encoding rotation by angle θ about unit axis
â=(ax,ay,az): q = (cos(θ/2), ax·sin(θ/2), ay·sin(θ/2), az·sin(θ/2)) =
(w, x, y, z) // Applying rotation to vector v (as pure quaternion
v=(0,v)): v_rotated = q · v · q* where q* = (w, −x, −y, −z) (conjugate
= inverse for unit q) // Equivalent matrix form (row-major used here
for display): R = | 1−2(y²+z²) 2(xy−wz) 2(xz+wy) | | 2(xy+wz)
1−2(x²+z²) 2(yz−wx) | | 2(xz−wy) 2(yz+wx) 1−2(x²+y²)| // Composing two
rotations: quaternion multiplication q_total = q₂ · q₁ (apply q₁
first, then q₂) w = w₂w₁ − x₂x₁ − y₂y₁ − z₂z₁ x = w₂x₁ + x₂w₁ + y₂z₁ −
z₂y₁ y = w₂y₁ − x₂z₁ + y₂w₁ + z₂x₁ z = w₂z₁ + x₂y₁ − y₂x₁ + z₂w₁
6. Interpolation and Smooth Rotation
For animation, we need smooth transitions between orientations.
Linearly interpolating matrix entries or Euler angles gives
non-constant angular velocity and poor results. Quaternion
interpolation is the standard solution.
LERP (Linear Interpolation)
// Naïve quaternion lerp — fast but not constant speed q(t) =
normalize((1−t)·q₁ + t·q₂) for t ∈ [0, 1] Use when: speed difference
is imperceptible (t range small), or performance is critical (LERP is
one add + normalize)
SLERP (Spherical Linear Interpolation)
// Slerp — constant angular velocity along great circle on unit
4-sphere Ω = arccos(q₁ · q₂) (dot product of quaternion components)
q(t) = [sin((1−t)Ω) / sinΩ]·q₁ + [sin(tΩ) / sinΩ]·q₂ Edge cases: - If
q₁·q₂ < 0: negate q₂ before slerp (short path vs long path) - If Ω ≈
0: fall back to LERP (quaternions nearly identical) Slerp is the gold
standard for animation blending — it preserves constant angular
velocity and shortest path.
Gimbal lock explained: With Euler angles (e.g. Z·Y·X
order), if the Y rotation is ±90°, the X and Z axes become parallel —
they both rotate around the same world axis. One degree of freedom is
lost. No combination of X and Z rotation can then produce rotation
around the lost axis. Quaternions have no such degeneracy because they
operate directly on the 4D unit sphere.
Squad and Higher-Order Splines
For smooth paths through multiple keyframe orientations, SLERP alone
creates C¹ discontinuities at keyframes (velocity is continuous but
angular acceleration is not). Squad (Spherical Cubic
Interpolation) extends Bézier curves to quaternion space, producing C²
smooth rotation paths used in game camera systems and cinematic
animation.
7. Normal Transforms and the Normal Matrix
Surface normals are direction vectors and transform differently from
points under non-uniform scaling. If you transform a normal by the
same matrix M as the vertices, non-uniform scale will distort the
normals and break lighting calculations.
// Wrong: N_world = M · N_local (breaks for non-uniform scale) //
Correct: N_world = transpose(inverse(M)) · N_local The "normal matrix"
= transpose(inverse(M)) Often precomputed on the CPU and passed as a
uniform: uniform mat3 normalMatrix; // in GLSL For orthogonal matrices
(rotation only, no scale): inverse(M) = transpose(M) → normal matrix =
M itself (so pure rotation transforms normals correctly with the same
matrix) Proof sketch: If t is a tangent (Mv is a transformed point),
normal N must stay perpendicular: N·t = 0. After transform: (AN)·(Mt)
= 0 → (AN)ᵀ(Mt) = 0 → Nᵀ(AᵀM)t = 0 → AᵀM = I → A = (M⁻¹)ᵀ =
transpose(inverse(M)) ✓
8. Linear Algebra on the GPU: GLSL & WGSL
GPU shader languages have first-class linear algebra types. The GPU
executes vector and matrix operations in single instructions across
all lanes of a SIMD processor — what takes a loop in CPU code is a
single op on the GPU.
GLSL (WebGL / OpenGL)
// GLSL built-in types vec2, vec3, vec4 // float vectors mat2, mat3,
mat4 // float matrices (column-major) ivec3, uvec3, bvec3 // integer,
unsigned, bool vectors // Common operations (all built-in,
hardware-accelerated) float d = dot(a, b); // dot product vec3 n =
cross(a, b); // cross product vec3 u = normalize(v); // normalise
(unit length) float l = length(v); // Euclidean length vec3 r =
reflect(d, n); // reflection vector vec3 rf = refract(d, n, eta); //
refraction vector float f = mix(a, b, t); // linear interpolation
(LERP) vec3 c = clamp(v, 0.0, 1.0); // component-wise clamp //
Matrix-vector multiplication (column vector post-multiplication) vec4
v_clip = projMatrix * viewMatrix * modelMatrix * v_local; // Accessing
columns (mat4 is array of 4 column vec4s) vec4 col0 = myMat[0]; //
first column float m23 = myMat[2][3];// row 3, col 2 (column-major:
col 2, row 3)
WGSL (WebGPU)
// WGSL (WebGPU Shading Language) — similar but stricter var v: vec3f
= vec3f(1.0, 0.0, 0.0); var m: mat4x4f = /* ... */; let d: f32 =
dot(a, b); let n: vec3f = normalize(cross(b - a, c - a)); // Matrix
multiply: same notation let clip_pos: vec4f = uniforms.mvp *
vertex_pos; // Key difference from GLSL: explicit types everywhere, //
no implicit conversions (1 instead of 1.0 is a compile error)
Performance note: On modern GPUs, a mat4×vec4
multiply is a single hardware instruction executing 16
multiply-accumulate operations simultaneously. Transforming a batch of
one million vertices takes a few milliseconds on a mid-range GPU — the
bottleneck is usually memory bandwidth (fetching the vertex data), not
the arithmetic.
Practical Tips for 3D Programming
Always normalise after SLERP/quaternion operations
— floating-point drift accumulates and eventually makes |q| ≠ 1,
causing subtle scaling artifacts in rotation matrices.
Use glMatrix (JavaScript) or GLM (C++) for
production work — hand-coded matrix math has subtle column/row major
bugs that cost hours to debug.
Watch the multiplication order carefully — glMatrix
functions like mat4.multiply(out, a, b) compute a·b,
meaning b is applied first. Many bugs come from reversing this
order.
Avoid inverting matrices on the GPU —
single-precision float inverse of a nearly-singular matrix loses
accuracy fast. Compute inverses on the CPU and upload as uniforms.
Depth precision — the perspective projection
compresses depth non-linearly, putting most precision near the near
plane. Use a logarithmic depth buffer or reversed-Z (near=1, far=0)
for large-scale scenes.