Learning #31 – Linear Algebra for Scientists: Matrices, Eigenvectors, Transformations and PCA

Linear algebra is the shared language of physics, engineering, and machine learning. A rotation in 3D space, the normal modes of a vibrating molecule, the weights of a neural network layer, and the principal stresses in a bridge girder are all expressed through the same objects: matrices, vectors, and the eigenvalue problem. This post develops linear algebra from geometric intuition to singular value decomposition.

Most scientists encounter linear algebra as a compulsory course before they need it. The result is that many learn the mechanics (row reduction, matrix multiplication) without the geometry. This post tries to do the opposite: start from geometric meaning and derive the algebra as the natural language for describing it.

1. Matrices as Linear Maps

A matrix A of size m×n defines a linear function f: ℝn → ℝm by f(x) = Ax. Linear means two properties hold:

Geometrically, a linear map sends straight lines to straight lines (or collapses them to a point if det A = 0), and sends the origin to itself. The columns of A tell you exactly where the standard basis vectors go: column j of A is f(ej).

Composition and Basis Change

Composition:   (A ∘ B) x = A(Bx)   →   matrix product AB

Change of basis from B to C:
  [v]_C = M_{C←B} [v]_B   where M_{C←B} = C⁻¹ B

Similarity transform: A in basis C   →   A' = P⁻¹ A P
(P = change-of-basis matrix, columns = new basis vectors in old coords)

For orthonormal bases: P⁻¹ = Pᵀ  (rotation/reflection matrices)

This is why the choice of coordinate system matters so much in physics: expressing a tensor in its principal axes (the basis of eigenvectors) makes its action diagonal and interpretable. The inertia tensor of a rigid body becomes I1, I2, I3 along the principal axes; the stress tensor at a point becomes three principal stresses without shear.

2. Determinants as Signed Volume Scaling

The determinant of a square matrix A equals the signed volume of the parallelotope spanned by its column vectors. For 2×2 matrices:

Determinant — Geometric and Algebraic Forms

2×2:  det(A) = ad − bc
       (signed area of parallelogram spanned by columns)

3×3:  det(A) = a(ei−fh) − b(di−fg) + c(dh−eg)
       (Sarrus rule / cofactor expansion along row 1)

Properties:
  det(AB) = det(A) det(B)
  det(Aᵀ) = det(A)         (transpose preserves volume)
  det(A⁻¹) = 1/det(A)
  det(αA) = α∧n det(A)  (scaling each row by α scales det by α)
  det(A) = 0  ⇔  A is singular  ⇔  columns are linearly dependent

In the interactive matrix transforms visualiser, the determinant determines how a shape’s area changes under the transformation: a unit square becomes a parallelogram with area |det A|. A negative determinant indicates a reflection (orientation reversal). When det = 0, the entire plane collapses onto a line or a point.

3. Eigenvectors and Eigenvalues

An eigenvector of matrix A is a non-zero vector v satisfying Av = λv: the transformation only scales the vector, not rotates it. The scalar λ is the corresponding eigenvalue.

Characteristic Polynomial and Diagonalisation

Eigenvalue equation:   Av = λv   ⇔   (A − λI)v = 0
Characteristic poly:   det(A − λI) = 0

For 2×2: λ² − tr(A)λ + det(A) = 0
  λ₁₂ = [tr(A) ± √(tr(A)² − 4 det(A))] / 2

Diagonalisation (if A has n independent eigenvectors):
  A = P D P⁻¹
  D = diag(λ₁, …, λₙ),   P = [v₁ | v₂ | … | vₙ]

Powers:  Aᵁ= = P Dᵁ= P⁻¹   (cheap: just raise each λ𝑖 to the power k)
Exponential: eᴬᵀ = P eᴬ P⁻¹  (useful for linear ODE systems)

Eigenvalues govern the long-run behaviour of linear dynamical systems xn+1 = Axn: the system grows if any |λ| > 1 and contracts to zero if all |λ| < 1. For continuous systems dx/dt = Ax, stability requires all eigenvalues to have negative real parts.

4. The Spectral Theorem and Its Applications

The spectral theorem is the central result of linear algebra for physics:

Spectral Theorem (Real Symmetric Case)

Let A = Aᵀ (real symmetric, n×n).
Then:
  1. All eigenvalues of A are real.
  2. Eigenvectors for distinct eigenvalues are orthogonal.
  3. A is orthogonally diagonalisable:   A = Q Λ Qᵀ
     Q orthogonal (QᵀQ = I),  Λ = diag(λ₁, …, λₙ)

For Hermitian matrices (A = A†, complex):
  Same conclusions hold over ℂ.
  Eigenvalues real  ⇔  quantum observables give real measurements.

The spectral theorem has direct, foundational interpretations in multiple fields:

5. Singular Value Decomposition (SVD)

Eigenvalue decomposition requires a square matrix. SVD generalises it to any m×n matrix and is more numerically stable:

SVD and the Pseudoinverse

A = U Σ Vᵀ    (any real m×n matrix)

U  m×m orthogonal  (left singular vectors = columns)
Σ  m×n diagonal   (singular values σ₁ ≥ σ₂ ≥ … ≥ 0)
V  n×n orthogonal  (right singular vectors = columns)

Relationship to eigenvalues:
  AᵀA = V ΣᵀΣ Vᵀ,  singular values σ𝑖 = √(eigenvalues of AᵀA)

Truncated SVD (rank-k approximation):
  A ≈ Uᵁ Σᵁ Vᵁᵀ   (best k-rank approximation, Eckart-Young theorem)

Moore-Penrose pseudoinverse:
  A⁺ = V Σ⁺ Uᵀ   where Σ⁺ = diag(1/σ₁, …, 1/σ𝑟, 0, …)
Least-squares solution: x∗ = A⁺ b  (minimises ‖Ax−b‖²)

SVD is the workhorse of numerical linear algebra: it solves least-squares problems (data fitting, tomographic reconstruction), computes low-rank approximations (image compression, latent semantic analysis), and provides the condition number κ(A) = σmaxmin which quantifies how sensitive Ax = b is to perturbations.

6. Principal Component Analysis (PCA)

PCA finds the directions of maximum variance in a dataset. Given n data points in ℜd (rows of matrix X, mean-centred), PCA diagonalises the sample covariance matrix:

PCA via Covariance Eigendecomposition

Sample covariance:   C = (1/(n−1)) Xᵀ X    (d×d, symmetric PSD)
Eigendecomposition:  C = Q Λ Qᵀ
  λ₁ ≥ λ₂ ≥ … ≥ λ𝑑 ≥ 0   (principal variances)
  q₁, q₂, …, q𝑑             (principal components)

Projection onto first k PCs:
  X𝔌 = X Qᵁ   (n×k, low-dimensional representation)

Variance retained by k components:
  R_k = (λ₁ + … + λᵁ) / (λ₁ + … + λ𝑑)

Connection to SVD:
  If X = U Σ Vᵀ then eigenvectors of C = V,  λ𝑖 = σ𝑖²/(n−1)

PCA appears throughout science: it is used to identify the dominant modes of climate variability (EOF analysis), to compress gene expression profiles (bioinformatics), to separate signal sources in EEG/MEG (when combined with independent component analysis), and to initialise neural network training by whitening the input feature space.

Interactive Visualisations

The matrix transforms simulation lets you build geometric intuition for all the concepts above. Use the 2×2 sliders to construct rotations (det = 1), reflections (det = −1), shears (det = 1, one eigenvalue = 1), scalings, and projections (det = 0). The eigenvector overlay shows the fixed directions when they exist; the unit circle overlay shows where circles map under the transformation (the semi-axes of the result are the singular values).

Why linear algebra everywhere? Because the real world is rarely linear — but it often approximately is, locally. Linearisation (Taylor expansion around an equilibrium, Jacobian matrix of a dynamical system) reduces any smooth non-linear problem to a linear one at small amplitude. The eigenvalues of that linear approximation determine local stability; the eigenvectors determine the characteristic timescales and normal-mode shapes.

Further Reading

The following topics build naturally from this post: