Particle Swarm Optimization (PSO)
In 1995, James Kennedy and Russell Eberhart watched a flock of starlings bank and wheel as one — and turned the observation into a global optimisation algorithm. PSO needs no gradient, no problem structure, and almost no tuning: a swarm of candidate solutions moves through the search space guided by its own best-found positions and the swarm's collective best.
1. Origin and intuition
Kennedy and Eberhart's insight was deceptively simple:
good solutions attract neighbours. Just as a bird
adjusts its flight to keep up with its flock while also remembering
where it personally found the best food, each particle in PSO
carries both a social memory (the swarm's best-known position,
gbest) and a cognitive memory (its own best-known
position, pbest).
The biological analogy extends further. In a real flock, no individual
has global knowledge — each bird sees only nearby neighbours. Standard
PSO abstracts this into a fully connected social topology where every
particle can see gbest, but local topology variants
restrict communication to nearest neighbours, producing richer
exploration dynamics.
PSO belongs to the broad family of swarm intelligence algorithms alongside Ant Colony Optimisation (ACO), Artificial Bee Colony (ABC), and Firefly Algorithm. All share the key property of emergence: near-optimal global behaviour from simple local rules, without centralised control.
2. The PSO algorithm
PSO searches an n-dimensional continuous space. A swarm of N particles is randomly initialised within the search bounds. At each iteration, each particle updates its velocity and position:
- Initialise N particles with random positions xᵢ ∈ [xmin, xmax]ⁿ and random velocities vᵢ ∈ [−vmax, vmax]ⁿ.
- Evaluate objective function f(xᵢ) for each particle.
- Update personal best: if f(xᵢ) < f(pbesti), set pbesti = xᵢ.
- Update global best: if any pbesti improves on gbest, set gbest = pbesti.
- Update velocity using the velocity equation (Section 3).
- Update position: xᵢ = xᵢ + vᵢ.
- Clamp: if xᵢ leaves [xmin, xmax], apply boundary handling.
- Repeat Steps 2–7 until convergence or maximum iterations.
The algorithm assumes minimisation by convention (for maximisation, negate the objective). The search bounds and velocity limit vmax are the primary problem-specific settings.
Particle state
Each particle i carries four vectors:
- xᵢ — current position in n-dimensional space
- vᵢ — current velocity (direction and speed of movement)
- pbesti — best position ever found by particle i
- f(pbesti) — objective value at pbesti
The swarm also maintains a single shared gbest — the best position found by any particle across all time.
3. Velocity update equation
The core of PSO is the velocity update rule. The canonical inertia-weight form (Shi and Eberhart 1998) is:
where r₁ and r₂ are independent uniform random numbers drawn from [0, 1] at each step for each dimension. The three terms correspond to three distinct forces:
After computing the new velocity, position is updated simply as:
Velocity clamping
Without bounds on velocity, particles can fly arbitrarily fast and overshoot the optimum. Velocity clamping constrains each dimension:
A common heuristic sets vmax,d = (xmax,d − xmin,d) / 2 — half the search range per dimension. This prevents premature convergence caused by excessive exploration speed.
Stochastic components
The random scalars r₁ and r₂ are resampled per particle per dimension per iteration. This is essential: without randomness, particles in a symmetric configuration could lock into identical movement patterns and the swarm would collapse to a deterministic gradient-like update. The stochasticity also provides implicit restarts when a particle is far from both pbest and gbest.
4. Key parameters
| Parameter | Symbol | Typical value | Effect |
|---|---|---|---|
| Swarm size | N | 20–50 | Larger N → better coverage, more evaluations per iteration |
| Inertia weight | ω | 0.4–0.9 (linearly decayed) | High ω explores; low ω exploits. Decay from 0.9→0.4 is the most common schedule. |
| Cognitive coefficient | c₁ | 1.5–2.0 | Weight of personal best attraction. Too high → particles diverge independently. |
| Social coefficient | c₂ | 1.5–2.0 | Weight of global best attraction. Too high → premature convergence to local optima. |
| Velocity limit | vmax | ½·(xmax−xmin) | Prevents particle explosion. Too small → slow convergence. |
| Max iterations | T | 200–1000 | Problem-dependent; monitor gbest improvement per iteration. |
Inertia weight schedules
The inertia weight ω is the most sensitive parameter. A fixed low ω (≈ 0.4) causes rapid but possibly premature convergence. The linearly decreasing inertia weight (LDIW) balances exploration and exploitation:
Non-linear schedules (exponential decay, chaotic maps, adaptive ω based on swarm diversity) can outperform LDIW on specific problems but add complexity.
5. Convergence and stability
Clerc and Kennedy (2002) formalised the convergence conditions for the original PSO. Ignoring velocity clamping and treating the update as a stochastic difference equation, a single particle's expected trajectory converges if and only if:
Without parameter control, PSO oscillates and may not converge. The analysis shows that the optimal region in (ω, c₁, c₂) parameter space forms a triangular region known as the convergence triangle.
Stagnation and premature convergence
The most common failure mode is premature convergence: the entire swarm converges to a local optimum before exploring the full search space. This happens when:
- The social coefficient c₂ is too large relative to c₁, causing particles to rush toward the current gbest.
- The swarm lacks diversity — all pbest values cluster near one region.
- The objective landscape has a deceptive local optimum that attracts the swarm.
Mitigation strategies include random restart for stagnated particles, local topology (lbest PSO, Section 6), or increasing swarm size N.
6. Variants: CPSO, APSO, lbest
Constriction Factor PSO (CPSO)
Clerc (1999) derived a constriction factor χ that guarantees convergence without explicit velocity clamping:
CPSO with χ ≈ 0.73 and c₁ = c₂ = 2.05 is mathematically equivalent to inertia-weight PSO with ω = 0.729 and c₁ = c₂ = 1.495. The two formulations produce identical trajectories under the same random seeds.
Local Best (lbest) PSO
Instead of a single shared gbest, each particle i
communicates only with its k nearest neighbours (typically k = 2 or 3
in a ring topology). Each particle uses the lbest —
best position found among its local neighbourhood:
lbest PSO is slower to converge but substantially more resistant to premature convergence on multi-modal functions. Good solutions propagate through the ring gradually, allowing different parts of the swarm to explore different basins of attraction simultaneously.
Adaptive PSO (APSO)
APSO (Zhan et al. 2009) monitors the swarm's evolutionary state (exploration / exploitation / convergence / jumping-out) using a measure of particle fitness ranking and density, and adapts ω and c₁, c₂ in real time. In the exploration state, ω is increased; in convergence, ω is decreased and social weight c₂ is raised. This removes the need for offline parameter tuning.
Bare-Bones PSO
Kennedy (2003) proposed a Gaussian sampling version that eliminates velocity entirely:
Bare-Bones PSO is surprisingly competitive on many benchmarks and requires zero parameter tuning.
7. PSO vs GA vs SA
| Property | PSO | Genetic Algorithm | Simulated Annealing |
|---|---|---|---|
| Population | Yes (N particles) | Yes (N individuals) | No (single solution) |
| Gradient required | No | No | No |
| Memory used | pbest, gbest | None (selection pressure) | None |
| Exploration mechanism | Inertia + random components | Crossover + mutation | Random perturbation, temperature schedule |
| Discrete spaces | Possible (BPSO) | Native (binary GA) | Native (SA on combinatorial) |
| Implementation complexity | Very low | Medium | Low |
| Convergence speed | Fast (few dozen iterations) | Medium | Slow (many perturbations) |
| Global optimum guarantee | No | No | Yes (for T → 0 slowly enough) |
In practice, PSO is often preferred for continuous, low-to-medium dimensional optimisation (≤ 100 dimensions) because it is simple to implement, converges quickly, and has few hyperparameters. GAs come into their own for combinatorial problems and problems requiring structural variation (e.g. evolving neural network topology via NEAT). SA excels when the landscape has a clear annealing analogy or when a single high-quality solution is needed from a long run.
8. JavaScript implementation
// Particle Swarm Optimization — clean 2D example
// Minimises the Rastrigin function f(x,y) = 20 + x²−10cos(2πx) + y²−10cos(2πy)
// Global minimum at (0,0) = 0
function rastrigin(x, y) {
return 20 + x*x - 10*Math.cos(2*Math.PI*x)
+ y*y - 10*Math.cos(2*Math.PI*y);
}
// PSO configuration
const N = 40; // swarm size
const DIMS = 2; // dimensionality
const LB = -5.12; // lower bound per dimension
const UB = 5.12; // upper bound per dimension
const W_MAX = 0.9; // initial inertia
const W_MIN = 0.4; // final inertia
const C1 = 1.5; // cognitive coefficient
const C2 = 1.5; // social coefficient
const VMAX = (UB - LB) / 2; // velocity limit
const MAX_IT = 300;
// Initialise
const pos = Array.from({length: N}, () => Array.from({length: DIMS}, () => LB + Math.random()*(UB-LB)));
const vel = Array.from({length: N}, () => Array.from({length: DIMS}, () => (Math.random()-0.5)*2*VMAX));
const pbest = pos.map(p => [...p]);
const pfval = pbest.map(p => rastrigin(p[0], p[1]));
let gbest = pbest[pfval.indexOf(Math.min(...pfval))].slice();
let gfval = Math.min(...pfval);
// Main loop
for (let t = 0; t < MAX_IT; t++) {
const w = W_MAX - (W_MAX - W_MIN) * (t / MAX_IT); // linearly decreasing ω
for (let i = 0; i < N; i++) {
for (let d = 0; d < DIMS; d++) {
const r1 = Math.random(), r2 = Math.random();
// Velocity update
vel[i][d] = w * vel[i][d]
+ C1 * r1 * (pbest[i][d] - pos[i][d])
+ C2 * r2 * (gbest[d] - pos[i][d]);
// Velocity clamping
vel[i][d] = Math.max(-VMAX, Math.min(VMAX, vel[i][d]));
// Position update
pos[i][d] += vel[i][d];
// Boundary handling: reflect
if (pos[i][d] < LB) { pos[i][d] = LB; vel[i][d] *= -0.5; }
if (pos[i][d] > UB) { pos[i][d] = UB; vel[i][d] *= -0.5; }
}
// Evaluate
const fval = rastrigin(pos[i][0], pos[i][1]);
// Update personal best
if (fval < pfval[i]) {
pfval[i] = fval;
pbest[i] = pos[i].slice();
}
// Update global best
if (fval < gfval) {
gfval = fval;
gbest = pos[i].slice();
}
}
}
console.log('Best position:', gbest);
console.log('Best value:', gfval.toFixed(6));
// Typical result: value ≈ 0.000001, position ≈ [0.0001, 0.0002]
Boundary handling strategies
Four common approaches when a particle leaves the search bounds:
- Absorbing: clamp position to bound, zero the velocity component that hit the boundary.
- Reflecting: clamp position to bound, negate (and optionally scale) the velocity.
- Damping: clamp position, multiply velocity by −λ (0 < λ < 1).
- Periodic: wrap position to the opposite bound (toroidal space) — useful for periodic functions.
Reflecting boundary handling (used in the example above) is typically robust for most landscapes.
9. Applications
Engineering design
PSO is widely used for engineering optimisation where objectives are computationally expensive and gradients are unavailable or unreliable. Common examples: optimal PID controller gains, structural design of trusses and frames (weight minimisation subject to stress constraints), antenna array geometry for minimum side-lobe level.
Machine learning and neural networks
PSO can optimise neural network weights as an alternative to gradient-based backpropagation, particularly when the loss landscape has many saddle points or when gradient information is unavailable (e.g. non-differentiable activation functions). PSO-trained networks are less prone to getting trapped in poor local minima on small datasets.
Hyperparameter optimisation
PSO treats hyperparameters (learning rate, regularisation, architecture parameters) as dimensions in a continuous search space and optimises them against cross-validation loss. It naturally handles mixed continuous/integer dimensions when combined with rounding.
Multi-objective PSO (MOPSO)
For problems with multiple conflicting objectives (e.g. minimise cost and maximise performance), MOPSO maintains a Pareto archive of non-dominated solutions. The social attractor gbest is selected randomly from the archive, weighted by archive crowding. Over time the swarm traces out an approximation of the Pareto front.
Power systems and renewable energy
Economic dispatch, optimal power flow, wind turbine layout optimisation (maximise farm output while minimising wake losses) — all involve non-convex, non-differentiable objectives where PSO's gradient-free nature is a decisive advantage.