Learning #19 — Algorithms & Computational Complexity: Sorting, Pathfinding and NP-Hard Problems

Why Complexity Is the Right Abstraction

Algorithms are precise recipes: given this input, do these specific steps. But two algorithms can compute the same answer using dramatically different amounts of time or memory. Measuring that difference in wall-clock seconds is useless — it depends on your hardware, your compiler, what other processes are running. What we want is a measure that captures the essential difficulty of the problem, independent of hardware.

Big-O notation provides exactly that. O(n²) means: if you double the input size, the runtime roughly quadruples. O(n log n) means doubling the input only slightly more than doubles the work. O(2ⁿ) means each extra input doubles the runtime — a death sentence for large inputs. These growth rates separate algorithms into practical tiers, and the difference between a tier-1 and tier-3 algorithm can mean the difference between a responsive app and one that freezes your browser.

The six simulations below make complexity visible: you watch O(n²) algorithms struggle as n grows, and see how clever data structures reduce O(n²) graph traversal to O((V + E) log V) without changing the answer at all.

Part 1: Sorting Algorithms

Sorting Simulation — seeing O(n²) vs O(n log n) live

Sorting is the first algorithm most programmers study and the most thoroughly analysed in computer science. Its simplicity makes it a perfect laboratory for comparing complexity classes: bubble sort and insertion sort are O(n²) in the worst case; merge sort and heap sort are O(n log n) always; quicksort is O(n log n) on average but O(n²) worst-case without randomised pivot selection.

Sorting Algorithm Complexity Comparison

Algorithm     Best       Average    Worst      Space     Stable?
──────────────────────────────────────────────────────────────────
Bubble        Ω(n)       Θ(n²)      O(n²)      O(1)      Yes
Insertion     Ω(n)       Θ(n²)      O(n²)      O(1)      Yes
Selection     Ω(n²)      Θ(n²)      O(n²)      O(1)      No
Merge         Ω(n log n) Θ(n log n) O(n log n) O(n)      Yes
Quicksort     Ω(n log n) Θ(n log n) O(n²)*     O(log n)  No
Heap Sort     Ω(n log n) Θ(n log n) O(n log n) O(1)      No
Radix Sort    Ω(nk)      Θ(nk)      O(nk)      O(n+k)    Yes
Tim Sort      Ω(n)       Θ(n log n) O(n log n) O(n)      Yes

* Quicksort worst case avoided with randomised pivot

Inversions and insertion sort:
  Cost ∝ number of inversions in array
  Already-sorted array: O(n) comparisons (best case)
  Reverse-sorted: n(n-1)/2 inversions → O(n²)

Lower bound for comparison-based sorting:
  Any comparison-based sort requires Ω(n log n) comparisons
  Proof: decision tree has n! leaves → height ≥ log₂(n!)

The Sorting simulation visualises all major algorithms side-by-side on the same array. Choose array size (8 to 1000 elements), distribution (random, nearly sorted, reverse sorted, many duplicates), and watch the bar chart animate. The comparison counter confirms the Big-O prediction: merge sort's counter grows as n log n; bubble sort's counter grows as n². Nearly sorted arrays reveal insertion sort's hidden strength — its Ω(n) best case.

Practical note: Tim Sort — Python's and Java's default sort — is a hybrid of merge sort and insertion sort. It detects naturally ordered "runs" and merges them. On nearly-sorted real data it approaches O(n), making it faster than a theoretically optimal O(n log n) algorithm on random data.

Part 2: Graph Pathfinding

Pathfinding Simulation — Dijkstra and A*

Navigation, network routing, game AI — all reduce to the same problem: find the shortest path between two nodes in a weighted graph. Dijkstra's algorithm solves this optimally in O((V + E) log V) using a priority queue. A* extends Dijkstra with a heuristic function that estimates the remaining distance, dramatically reducing the number of nodes explored on structured graphs like grids.

Dijkstra's Algorithm & A*

Dijkstra (single-source shortest path):
  Input:  weighted graph G=(V,E), source s
  Output: dist[v] = shortest distance from s to v

  Priority queue (min-heap):
    dist[s] = 0; dist[v] = ∞ for v ≠ s
    while Q not empty:
      u = extract_min(Q)
      for each neighbour v of u:
        if dist[u] + w(u,v) < dist[v]:
          dist[v] = dist[u] + w(u,v)
          decrease_key(Q, v, dist[v])

  Complexity: O((V + E) log V) with binary heap
              O(E + V log V) with Fibonacci heap

A* search:
  f(n) = g(n) + h(n)
  g(n) = actual cost from source to n
  h(n) = admissible heuristic (never overestimates)

  Admissible heuristics for grids:
    h = Euclidean distance  (8-directional movement)
    h = Manhattan distance  (4-directional movement)
    h = Chebyshev distance  (8-dir, uniform diagonal cost)

  A* is optimal if h is admissible.
  A* visits fewer nodes than Dijkstra when h is informative.

Breadth-First Search (unweighted shortest path):
  Complexity: O(V + E)   finds shortest path by edge count

In the Pathfinding simulation, draw walls, set start and end points, then compare Dijkstra, A*, Breadth-First Search, and Depth-First Search on the same maze. The visited-node counter makes the difference stark: A* with Manhattan heuristic often explores 5–10× fewer nodes than Dijkstra on a grid, reaching the goal without scanning irrelevant areas. Depth-First Search finds a path quickly, but rarely the shortest one.

Maze Generation Simulation

The Maze simulation pairs naturally with pathfinding. It generates perfect mazes (no loops, one solution) using depth-first search with backtracking, Prim's algorithm, or Kruskal's with a Union-Find data structure. Since perfect mazes are spanning trees, they are the minimum connected subgraph of the grid — a beautiful crossover between graph theory and combinatorics.

Maze Generation — Algorithms & Data Structures

Depth-First Search (recursive backtracker):
  Perfect random mazes — unbiased but long corridors
  O(V) time, O(V) stack space

Randomised Prim's (growing tree):
  Adds random edge from frontier set
  More branching, shorter average path length
  O(E log E) with priority queue

Kruskal's + Union-Find:
  Assign random weights to all edges, sort, add if no cycle
  Union-Find with path compression:
    find: O(α(n)) ≈ O(1) amortised
    union: O(α(n)) amortised
  Overall: O(E α(E)) ≈ O(E)

Wilson's algorithm (loop-erased random walk):
  Generates mazes uniformly at random (truly unbiased)
  Expected O(V log V) time
  Every spanning tree equally likely

Part 3: Backtracking and Constraint Satisfaction

N-Queens Simulation — pruning the search space

Place N queens on an N×N chessboard so that no two queens attack each other. This is the canonical constraint satisfaction problem. Naïve brute force tries all N^N placements: for N=8 that is 16 million attempts. Basic backtracking reduces this to 15,720 attempts by abandoning partial solutions the moment a constraint is violated. With forward checking, it drops further to around 2,000.

N-Queens — Backtracking & Complexity

Problem:
  Place N queens on N×N board, no row/column/diagonal shared.

Solutions:
  N=1: 1    N=4: 2    N=8: 92    N=12: 14,200
  N=13: 73,712    N=14: 365,596    N=15: 2,279,184

Naïve brute force:      O(N^N)
Backtracking (rows):    O(N!) — one queen per row
With column tracking:   avoid O(N) column checks per placement
With diagonal tracking: two boolean arrays, size 2N-1 each

Backtracking schema:
  placeQueens(row, colsUsed, diag1, diag2):
    if row == N: solution found
    for col in 0..N-1:
      if col ∉ colsUsed AND (row-col) ∉ diag1 AND (row+col) ∉ diag2:
        recurse(row+1, ...)

Forward checking (AC-3):
  After placing queen, remove attacked squares from domains
  Prune subtrees where any variable has empty domain
  Significantly reduces backtracks for large N

The N-Queens simulation animates the backtracking search. Watch the algorithm place queens row by row, backtrack when stuck, try the next column, and eventually find a solution. A counter shows total recursive calls, revealing how dramatically constraint propagation compares to bare backtracking. Toggle "show all solutions" mode to count the full solution space.

Part 4: The Traveling Salesman Problem

TSP Simulation — NP-hard in practice

A salesman must visit N cities exactly once and return to the starting city, minimising total distance. The brute-force solution tries all (N-1)!/2 routes — for N=20 that is 60 quadrillion routes. TSP is NP-hard: no polynomial algorithm is known, and most researchers believe none exists. Yet for practical sizes, smart heuristics and exact branch- and-bound solvers find near-optimal solutions quickly.

Traveling Salesman — Complexity & Heuristics

Exact:
  Brute force:          O((n-1)!/2)    infeasible for n > 15
  Dynamic programming:  O(2ⁿ · n²)    Held-Karp algorithm
  Branch and Bound:     exponential worst case, good in practice

Constructive heuristics (quick, suboptimal):
  Nearest Neighbour:    O(n²)  — build tour greedily
    ratio to optimal:  1.25 average, can be 2× worse

  Greedy edge insertion: O(n² log n)
    sort edges by length, add if valid (no crossing, degree < 2)

Local search (improve existing tour):
  2-opt:  remove 2 edges, reconnect — O(n²) per pass
    iterate until no improvement: O(n² · iterations)
    typically within 5% of optimal

  3-opt:  remove 3 edges — O(n³) per pass — better but slower
  Lin-Kernighan (LK): sophisticated variable-depth search
    state-of-the-art heuristic, < 1% from optimal on most instances

NP-hardness:
  TSP is NP-hard: no polynomial algorithm known
  P ≠ NP (assumed): no efficient exact algorithm exists
  Triangle inequality TSP: 1.5-approximation (Christofides 1976)

The TSP simulation lets you place cities on a canvas, then choose your algorithm: Nearest Neighbour, Greedy Edge, or 2-opt. The tour length counter updates live as 2-opt swaps improve the route. For 30 cities, a few seconds of 2-opt refinement typically reaches within 3–5% of the optimal. Increase to 100+ cities and observe that Nearest Neighbour creates a grotesque crossing mess that 2-opt quickly untangles — intuition for why local search matters.

P vs NP: TSP is NP-hard, meaning a polynomial solution would let you solve all NP problems efficiently — including breaking RSA encryption. The Clay Mathematics Institute offers $1,000,000 for a proof that P = NP or P ≠ NP, making it one of the most valuable open problems in mathematics.

Part 5: Genetic Algorithms

Genetic Algorithm Simulation — evolution as optimisation

When the search space is too vast for exact methods and local search gets trapped in local optima, genetic algorithms use the mechanics of biological evolution — selection, crossover, mutation — to explore broadly. GAs are not guaranteed to find the global optimum, but on many practical problems they find excellent solutions with modest computation.

Genetic Algorithm — Operators & Convergence

Population:  P = {x₁, x₂, …, xₙ}  (candidate solutions, encoded as chromosomes)
Fitness:     f(x) → ℝ              (objective to maximise)

Selection operators:
  Tournament:   pick k candidates, return fittest
  Roulette:     P(select xᵢ) = f(xᵢ) / Σf(xⱼ)
  Rank:         probability ∝ rank, not raw fitness (diversity)

Crossover operators:
  Single-point:   cut at random position, swap tails
  Uniform:        each gene inherited from parent 1 with prob p
  Order crossover (OX): for permutation problems (TSP)
    copy segment from parent 1, fill remaining from parent 2 in order

Mutation operators:
  Bit-flip:       flip each bit with probability p_m
  Swap:           exchange two random positions
  Inversion:      reverse a random sub-sequence

Schema theorem (Holland):
  Short, low-order, above-average schemata grow exponentially
  Intrinsic parallelism: n chromosomes implicitly evaluate O(n³) schemata

Parameters in practice:
  Population: 50–500;  Crossover rate: 0.6–0.9;  Mutation: 0.001–0.05
  Too low mutation → premature convergence
  Too high mutation → random search

The Genetic Algorithm simulation uses TSP as its optimisation target. Watch the population converge: early generations show wildly scattered tour lengths, then selection pressure drags the distribution toward shorter tours, while mutation injections prevent the population from collapsing to a single solution. Plot the best/average fitness curves to see the classic staircase convergence pattern — punctuated jumps as crossover discovers a new structural improvement.

Algorithm Collection

Sorting

Sorting Algorithms

Bubble, merge, quick, heap, radix — animated comparison with live counters

Graph Search

Pathfinding

Dijkstra, A*, BFS, DFS on interactive grid — visited-cell count comparison

Graph Theory

Maze Generation

DFS backtracker, Prim's, Kruskal's with Union-Find, Wilson's uniform random

Backtracking

N-Queens

Constraint satisfaction, backtracking animation, forward checking

NP-Hard

Traveling Salesman

Nearest Neighbour heuristic, 2-opt local search, live tour optimisation

Metaheuristic

Genetic Algorithm

Population, crossover, mutation, fitness convergence curve

Cross-Collection Connections

The algorithms in this collection appear throughout the platform in disguise. A* pathfinding is the foundation of the Boids flocking simulation's obstacle avoidance — agents find collision-free paths using the same priority queue expansion. Union-Find from maze generation reappears in the Percolation simulation to track connected clusters through a lattice. Genetic algorithms power the Evolutionary Dynamics simulation directly. Even sorting is present in the Aerofoil simulation, where pressure values along the surface are sorted to find stagnation and separation points. Complexity theory connects everything: the reason we use greedy algorithms for pathfinding and heuristics for TSP is precisely because exact methods don't scale — a fact made viscerally clear by watching N-Queens explode beyond N=15.

Algorithms & Methods in This Collection

Big-O notation Bubble sort Merge sort (divide & conquer) Quick sort (random pivot) Heap sort (binary heap) Radix sort (non-comparison) Tim Sort Dijkstra (priority queue) A* (admissible heuristic) BFS / DFS DFS backtracking maze Kruskal's MST Union-Find (path compression) Wilson's algorithm Constraint satisfaction Held-Karp (TSP DP) 2-opt local search Nearest Neighbour heuristic Tournament selection Order crossover (OX) Inversion mutation