Particle Swarm Optimization (PSO) — Swarm Intelligence

1. Origin and intuition

Kennedy and Eberhart's insight was deceptively simple: good solutions attract neighbours. Just as a bird adjusts its flight to keep up with its flock while also remembering where it personally found the best food, each particle in PSO carries both a social memory (the swarm's best-known position, gbest) and a cognitive memory (its own best-known position, pbest).

The biological analogy extends further. In a real flock, no individual has global knowledge — each bird sees only nearby neighbours. Standard PSO abstracts this into a fully connected social topology where every particle can see gbest, but local topology variants restrict communication to nearest neighbours, producing richer exploration dynamics.

Historical note: Kennedy and Eberhart presented PSO at ICNN-95 (IEEE International Conference on Neural Networks). The algorithm was simultaneously inspired by flocking models (Reynolds' Boids) and by the social comparison theory of psychologist Leon Festinger — the idea that individuals adjust beliefs by comparing themselves to peers.

PSO belongs to the broad family of swarm intelligence algorithms alongside Ant Colony Optimisation (ACO), Artificial Bee Colony (ABC), and Firefly Algorithm. All share the key property of emergence: near-optimal global behaviour from simple local rules, without centralised control.

2. The PSO algorithm

PSO searches an n-dimensional continuous space. A swarm of N particles is randomly initialised within the search bounds. At each iteration, each particle updates its velocity and position:

Initialise N particles with random positions xᵢ ∈ [x_min, x_max]ⁿ and random velocities vᵢ ∈ [−v_max, v_max]ⁿ.
Evaluate objective function f(xᵢ) for each particle.
Update personal best: if f(xᵢ) < f(pbest_i), set pbest_i = xᵢ.
Update global best: if any pbest_i improves on gbest, set gbest = pbest_i.
Update velocity using the velocity equation (Section 3).
Update position: xᵢ = xᵢ + vᵢ.
Clamp: if xᵢ leaves [x_min, x_max], apply boundary handling.
Repeat Steps 2–7 until convergence or maximum iterations.

The algorithm assumes minimisation by convention (for maximisation, negate the objective). The search bounds and velocity limit v_max are the primary problem-specific settings.

Particle state

Each particle i carries four vectors:

xᵢ — current position in n-dimensional space
vᵢ — current velocity (direction and speed of movement)
pbest_i — best position ever found by particle i
f(pbest_i) — objective value at pbest_i

The swarm also maintains a single shared gbest — the best position found by any particle across all time.

3. Velocity update equation

The core of PSO is the velocity update rule. The canonical inertia-weight form (Shi and Eberhart 1998) is:

vᵢ(t+1) = ω \cdot vᵢ(t) + c₁ \cdot r₁ \cdot (pbest i - xᵢ(t)) + c₂ \cdot r₂ \cdot (gbest - xᵢ(t))

where r₁ and r₂ are independent uniform random numbers drawn from [0, 1] at each step for each dimension. The three terms correspond to three distinct forces:

ω·v

Inertia

Momentum: the particle keeps moving in its current direction. ω > 1 favours exploration; ω < 1 promotes convergence.

c₁·r₁

Cognitive component

The particle is attracted to its own best-known position — "nostalgia" for past personal success.

c₂·r₂

Social component

The particle is attracted to the swarm's global best — "following the crowd" toward the best-known region.

After computing the new velocity, position is updated simply as:

xᵢ(t+1) = xᵢ(t) + vᵢ(t+1)

Velocity clamping

Without bounds on velocity, particles can fly arbitrarily fast and overshoot the optimum. Velocity clamping constrains each dimension:

vᵢ,d = clamp(vᵢ,d, -v max,d, +v max,d)

A common heuristic sets v_max,d = (x_max,d − x_min,d) / 2 — half the search range per dimension. This prevents premature convergence caused by excessive exploration speed.

Stochastic components

The random scalars r₁ and r₂ are resampled per particle per dimension per iteration. This is essential: without randomness, particles in a symmetric configuration could lock into identical movement patterns and the swarm would collapse to a deterministic gradient-like update. The stochasticity also provides implicit restarts when a particle is far from both pbest and gbest.

4. Key parameters

Parameter	Symbol	Typical value	Effect
Swarm size	N	20–50	Larger N → better coverage, more evaluations per iteration
Inertia weight	ω	0.4–0.9 (linearly decayed)	High ω explores; low ω exploits. Decay from 0.9→0.4 is the most common schedule.
Cognitive coefficient	c₁	1.5–2.0	Weight of personal best attraction. Too high → particles diverge independently.
Social coefficient	c₂	1.5–2.0	Weight of global best attraction. Too high → premature convergence to local optima.
Velocity limit	v_max	½·(x_max−x_min)	Prevents particle explosion. Too small → slow convergence.
Max iterations	T	200–1000	Problem-dependent; monitor gbest improvement per iteration.

Inertia weight schedules

The inertia weight ω is the most sensitive parameter. A fixed low ω (≈ 0.4) causes rapid but possibly premature convergence. The linearly decreasing inertia weight (LDIW) balances exploration and exploitation:

ω(t) = ω max - (ω max - ω min) \cdot (t / T) // Typical: ω_max = 0.9, ω_min = 0.4, T = max iterations

Non-linear schedules (exponential decay, chaotic maps, adaptive ω based on swarm diversity) can outperform LDIW on specific problems but add complexity.

5. Convergence and stability

Clerc and Kennedy (2002) formalised the convergence conditions for the original PSO. Ignoring velocity clamping and treating the update as a stochastic difference equation, a single particle's expected trajectory converges if and only if:

// Necessary and sufficient condition for convergence: 0 < ω < 1 0 < c₁ + c₂ < 4(1 + ω) / (1 - ω²) // simplified bound // In practice: ω = 0.729, c₁ + c₂ \leq 4 (Clerc's constriction values)

Without parameter control, PSO oscillates and may not converge. The analysis shows that the optimal region in (ω, c₁, c₂) parameter space forms a triangular region known as the convergence triangle.

Stagnation and premature convergence

The most common failure mode is premature convergence: the entire swarm converges to a local optimum before exploring the full search space. This happens when:

The social coefficient c₂ is too large relative to c₁, causing particles to rush toward the current gbest.
The swarm lacks diversity — all pbest values cluster near one region.
The objective landscape has a deceptive local optimum that attracts the swarm.

Mitigation strategies include random restart for stagnated particles, local topology (lbest PSO, Section 6), or increasing swarm size N.

6. Variants: CPSO, APSO, lbest

Constriction Factor PSO (CPSO)

Clerc (1999) derived a constriction factor χ that guarantees convergence without explicit velocity clamping:

φ = c₁ + c₂, φ > 4 χ = 2 / |2 - φ - sqrt(φ² - 4φ)| // Standard values: c₁ = c₂ = 2.05 \to φ = 4.1 \to χ \approx 0.7298 vᵢ(t+1) = χ \cdot [vᵢ(t) + c₁\cdotr₁\cdot(pbest-xᵢ) + c₂\cdotr₂\cdot(gbest-xᵢ)]

CPSO with χ ≈ 0.73 and c₁ = c₂ = 2.05 is mathematically equivalent to inertia-weight PSO with ω = 0.729 and c₁ = c₂ = 1.495. The two formulations produce identical trajectories under the same random seeds.

Local Best (lbest) PSO

Instead of a single shared gbest, each particle i communicates only with its k nearest neighbours (typically k = 2 or 3 in a ring topology). Each particle uses the lbest — best position found among its local neighbourhood:

vᵢ(t+1) = ω\cdotvᵢ + c₁\cdotr₁\cdot(pbest i -xᵢ) + c₂\cdotr₂\cdot(lbest i -xᵢ)

lbest PSO is slower to converge but substantially more resistant to premature convergence on multi-modal functions. Good solutions propagate through the ring gradually, allowing different parts of the swarm to explore different basins of attraction simultaneously.

Adaptive PSO (APSO)

APSO (Zhan et al. 2009) monitors the swarm's evolutionary state (exploration / exploitation / convergence / jumping-out) using a measure of particle fitness ranking and density, and adapts ω and c₁, c₂ in real time. In the exploration state, ω is increased; in convergence, ω is decreased and social weight c₂ is raised. This removes the need for offline parameter tuning.

Bare-Bones PSO

Kennedy (2003) proposed a Gaussian sampling version that eliminates velocity entirely:

μᵢ = (pbest i + gbest) / 2 σᵢ = |pbest i - gbest| xᵢ(t+1) = Normal(μᵢ, σᵢ) // sample new position directly

Bare-Bones PSO is surprisingly competitive on many benchmarks and requires zero parameter tuning.

7. PSO vs GA vs SA

Property	PSO	Genetic Algorithm	Simulated Annealing
Population	Yes (N particles)	Yes (N individuals)	No (single solution)
Gradient required	No	No	No
Memory used	pbest, gbest	None (selection pressure)	None
Exploration mechanism	Inertia + random components	Crossover + mutation	Random perturbation, temperature schedule
Discrete spaces	Possible (BPSO)	Native (binary GA)	Native (SA on combinatorial)
Implementation complexity	Very low	Medium	Low
Convergence speed	Fast (few dozen iterations)	Medium	Slow (many perturbations)
Global optimum guarantee	No	No	Yes (for T → 0 slowly enough)

In practice, PSO is often preferred for continuous, low-to-medium dimensional optimisation (≤ 100 dimensions) because it is simple to implement, converges quickly, and has few hyperparameters. GAs come into their own for combinatorial problems and problems requiring structural variation (e.g. evolving neural network topology via NEAT). SA excels when the landscape has a clear annealing analogy or when a single high-quality solution is needed from a long run.

8. JavaScript implementation

// Particle Swarm Optimization — clean 2D example
// Minimises the Rastrigin function f(x,y) = 20 + x²−10cos(2πx) + y²−10cos(2πy)
// Global minimum at (0,0) = 0

function rastrigin(x, y) {
  return 20 + x*x - 10*Math.cos(2*Math.PI*x)
            + y*y - 10*Math.cos(2*Math.PI*y);
}

// PSO configuration
const N       = 40;      // swarm size
const DIMS    = 2;       // dimensionality
const LB      = -5.12;   // lower bound per dimension
const UB      =  5.12;   // upper bound per dimension
const W_MAX   = 0.9;     // initial inertia
const W_MIN   = 0.4;     // final inertia
const C1      = 1.5;     // cognitive coefficient
const C2      = 1.5;     // social coefficient
const VMAX    = (UB - LB) / 2;  // velocity limit
const MAX_IT  = 300;

// Initialise
const pos   = Array.from({length: N}, () => Array.from({length: DIMS}, () => LB + Math.random()*(UB-LB)));
const vel   = Array.from({length: N}, () => Array.from({length: DIMS}, () => (Math.random()-0.5)*2*VMAX));
const pbest = pos.map(p => [...p]);
const pfval = pbest.map(p => rastrigin(p[0], p[1]));

let gbest = pbest[pfval.indexOf(Math.min(...pfval))].slice();
let gfval = Math.min(...pfval);

// Main loop
for (let t = 0; t < MAX_IT; t++) {
  const w = W_MAX - (W_MAX - W_MIN) * (t / MAX_IT);   // linearly decreasing ω

  for (let i = 0; i < N; i++) {
    for (let d = 0; d < DIMS; d++) {
      const r1 = Math.random(), r2 = Math.random();

      // Velocity update
      vel[i][d] = w * vel[i][d]
                + C1 * r1 * (pbest[i][d] - pos[i][d])
                + C2 * r2 * (gbest[d]    - pos[i][d]);

      // Velocity clamping
      vel[i][d] = Math.max(-VMAX, Math.min(VMAX, vel[i][d]));

      // Position update
      pos[i][d] += vel[i][d];

      // Boundary handling: reflect
      if (pos[i][d] < LB) { pos[i][d] = LB; vel[i][d] *= -0.5; }
      if (pos[i][d] > UB) { pos[i][d] = UB; vel[i][d] *= -0.5; }
    }

    // Evaluate
    const fval = rastrigin(pos[i][0], pos[i][1]);

    // Update personal best
    if (fval < pfval[i]) {
      pfval[i] = fval;
      pbest[i] = pos[i].slice();
    }

    // Update global best
    if (fval < gfval) {
      gfval = fval;
      gbest = pos[i].slice();
    }
  }
}

console.log('Best position:', gbest);
console.log('Best value:',   gfval.toFixed(6));
// Typical result: value ≈ 0.000001, position ≈ [0.0001, 0.0002]

Boundary handling strategies

Four common approaches when a particle leaves the search bounds:

Absorbing: clamp position to bound, zero the velocity component that hit the boundary.
Reflecting: clamp position to bound, negate (and optionally scale) the velocity.
Damping: clamp position, multiply velocity by −λ (0 < λ < 1).
Periodic: wrap position to the opposite bound (toroidal space) — useful for periodic functions.

Reflecting boundary handling (used in the example above) is typically robust for most landscapes.

Tuning tip: When trying PSO on a new problem, start with N = 30, ω = 0.729, c₁ = c₂ = 1.494 (the CPSO defaults), 200 iterations. If the algorithm stagnates early, reduce c₂ slightly (to 1.3) or switch to a ring topology (lbest). If it is slow to converge, increase N or add a periodic random restart for the worst 10% of particles.

9. Applications

Engineering design

PSO is widely used for engineering optimisation where objectives are computationally expensive and gradients are unavailable or unreliable. Common examples: optimal PID controller gains, structural design of trusses and frames (weight minimisation subject to stress constraints), antenna array geometry for minimum side-lobe level.

Machine learning and neural networks

PSO can optimise neural network weights as an alternative to gradient-based backpropagation, particularly when the loss landscape has many saddle points or when gradient information is unavailable (e.g. non-differentiable activation functions). PSO-trained networks are less prone to getting trapped in poor local minima on small datasets.

Hyperparameter optimisation

PSO treats hyperparameters (learning rate, regularisation, architecture parameters) as dimensions in a continuous search space and optimises them against cross-validation loss. It naturally handles mixed continuous/integer dimensions when combined with rounding.

Multi-objective PSO (MOPSO)

For problems with multiple conflicting objectives (e.g. minimise cost and maximise performance), MOPSO maintains a Pareto archive of non-dominated solutions. The social attractor gbest is selected randomly from the archive, weighted by archive crowding. Over time the swarm traces out an approximation of the Pareto front.

Power systems and renewable energy

Economic dispatch, optimal power flow, wind turbine layout optimisation (maximise farm output while minimising wake losses) — all involve non-convex, non-differentiable objectives where PSO's gradient-free nature is a decisive advantage.

Scalability limit: Standard PSO degrades on very high-dimensional problems (d > 500). Random-sampling of r₁ and r₂ per dimension means the swarm explores a vanishingly small fraction of the space. Cooperative PSO (CPSO-H) addresses this by partitioning dimensions into sub-swarms that optimise their slice while holding others at their current gbest value.