Devlog #24 — Client-Side Search Architecture: Inverted Index, Trie and URL State for 350 Pages

The Problem

With 350 simulation pages, the old approach — filtering a hard-coded JavaScript array — was getting unwieldy. Users searching for "refraction" should find Snell's Law, Total Internal Reflection, and Rainbow Formation, not just pages with the exact word "refraction" in the title. We needed full-text search with relevance ranking, all running in the browser with no backend.

Data Structure: Inverted Index

The search index is a plain JavaScript object built at deploy time from each simulation's title, description, category, and tags. Every token maps to a posting list: an array of document IDs with their term-frequency scores.

          // Simplified index structure const index = { "refraction": [ { id:
            "snells-law", tf: 0.82 }, { id: "total-internal-refl", tf: 0.71 }, {
            id: "rainbow", tf: 0.44 }, ], "lorenz": [ { id: "lorenz", tf: 1.00
            }, { id: "bifurcation", tf: 0.18 }, ], // ... ~4 200 tokens, 28 kB
            minified + gzip → 8.4 kB };
        

Relevance score = TF-IDF with a small title-boost: tokens in the title count 3× those in the description. Multi-word queries use AND semantics for the intersection of posting lists, then sum scores.

Prefix Autocomplete: Trie

The autocomplete dropdown — "type 'ref' and see 'refraction', 'reflection', 'refractive index'" — is backed by a trie (prefix tree) over the same token set. Lookup is O(m) where m is the prefix length, regardless of vocabulary size.

          class Trie { constructor() { this.root = {}; } insert(word) { let
            node = this.root; for (const ch of word) { node[ch] ??= {}; node =
            node[ch]; } node.$end = true; } suggest(prefix, limit = 5) { let
            node = this.root; for (const ch of prefix) { if (!node[ch]) return
            []; node = node[ch]; } // DFS from prefix node, collect up to
            `limit` words return this._collect(node, prefix, [], limit); }
            }
        

Tokenisation & Normalisation

Raw text goes through four steps before indexing:

Lower-case + Unicode normalise (NFC) — "Lorenz" and "lorenz" hit the same token.
Stop-word removal — 63 English stop words removed to reduce index size.
Porter stemmer (light) — "refracting", "refracted", "refraction" → "refract". Keeps index tight without losing recall.
Synonym map — hand-curated 40-entry map: "fourier" → also indexes "fft", "spectrum"; "lorenz" → "chaos", "attractor".

Performance

8.4 kB

Index size (gzip)

<6 ms

Query latency (p99)

<2 ms

Autocomplete p99

The index is loaded once on first search interaction via a dynamic import() — zero cost on pages that never use search. On a cold cache it fetches in ~40 ms on a 3G connection; subsequent queries are pure in-memory lookups.

URL State & Deep Links

Search queries are stored in the URL as ?q=lorenz+attractor — both for shareability and so the browser back button works as expected. The history API is updated with replaceState on every keystroke (debounced 150 ms) and pushState only when the user navigates to a result.

Search History via localStorage

Recent searches are persisted to localStorage under the key sim_search_history — up to 10 entries, stored as a JSON array. The history is shown as chips below the input on focus. No PII is ever stored; only the raw query string. Users can clear history with a single button.

Why not use a hosted search service? Cost, privacy, and offline support. Algolia/Typesense add a $50–200/mo infrastructure dependency and break offline mode entirely. Our 8.4 kB index fetches once and lives in the service worker cache — search works on a plane with no WiFi, and there is no search API key to rotate or leak.