The Problem
With 350 simulation pages, the old approach — filtering a hard-coded JavaScript array — was getting unwieldy. Users searching for "refraction" should find Snell's Law, Total Internal Reflection, and Rainbow Formation, not just pages with the exact word "refraction" in the title. We needed full-text search with relevance ranking, all running in the browser with no backend.
Data Structure: Inverted Index
The search index is a plain JavaScript object built at deploy time from each simulation's title, description, category, and tags. Every token maps to a posting list: an array of document IDs with their term-frequency scores.
// Simplified index structure const index = { "refraction": [ { id:
"snells-law", tf: 0.82 }, { id: "total-internal-refl", tf: 0.71 }, {
id: "rainbow", tf: 0.44 }, ], "lorenz": [ { id: "lorenz", tf: 1.00
}, { id: "bifurcation", tf: 0.18 }, ], // ... ~4 200 tokens, 28 kB
minified + gzip → 8.4 kB };
Relevance score = TF-IDF with a small title-boost: tokens in the title count 3× those in the description. Multi-word queries use AND semantics for the intersection of posting lists, then sum scores.
Prefix Autocomplete: Trie
The autocomplete dropdown — "type 'ref' and see 'refraction', 'reflection', 'refractive index'" — is backed by a trie (prefix tree) over the same token set. Lookup is O(m) where m is the prefix length, regardless of vocabulary size.
class Trie { constructor() { this.root = {}; } insert(word) { let
node = this.root; for (const ch of word) { node[ch] ??= {}; node =
node[ch]; } node.$end = true; } suggest(prefix, limit = 5) { let
node = this.root; for (const ch of prefix) { if (!node[ch]) return
[]; node = node[ch]; } // DFS from prefix node, collect up to
`limit` words return this._collect(node, prefix, [], limit); }
}
Tokenisation & Normalisation
Raw text goes through four steps before indexing:
- Lower-case + Unicode normalise (NFC) — "Lorenz" and "lorenz" hit the same token.
- Stop-word removal — 63 English stop words removed to reduce index size.
- Porter stemmer (light) — "refracting", "refracted", "refraction" → "refract". Keeps index tight without losing recall.
- Synonym map — hand-curated 40-entry map: "fourier" → also indexes "fft", "spectrum"; "lorenz" → "chaos", "attractor".
Performance
The index is loaded once on first search interaction via a dynamic
import() — zero cost on pages that never use search. On a
cold cache it fetches in ~40 ms on a 3G connection; subsequent queries
are pure in-memory lookups.
URL State & Deep Links
Search queries are stored in the URL as
?q=lorenz+attractor
— both for shareability and so the browser back button works as
expected. The history API is updated with
replaceState on every keystroke (debounced 150 ms) and
pushState only when the user navigates to a result.
Search History via localStorage
Recent searches are persisted to localStorage under the
key sim_search_history — up to 10 entries, stored as a
JSON array. The history is shown as chips below the input on focus. No
PII is ever stored; only the raw query string. Users can clear history
with a single button.
Why not use a hosted search service? Cost, privacy, and offline support. Algolia/Typesense add a $50–200/mo infrastructure dependency and break offline mode entirely. Our 8.4 kB index fetches once and lives in the service worker cache — search works on a plane with no WiFi, and there is no search API key to rotate or leak.