Devlog #17 — Building Fast Search & Category Filtering for 250+ Simulations

When the library hit 250 simulations, browsing became a problem. Users wanted to find "something with fluid" or filter by "physics + beginner". Here is how we built instant client-side search with zero dependencies — inverted index, trie prefix matching, and URL-serialised filter state.

250+
simulations indexed
<5 ms
search latency
18 KB
search index gzipped
0
dependencies

The Problem: 250 Items in a Grid

The original homepage showed all simulations in a responsive grid — fine at 50, manageable at 100, unnavigable at 250. Analytics confirmed the problem: users arriving via search landed on specific simulation pages, but users landing on the homepage scrolled for a few seconds then left.

We needed search and filtering. The constraints: no server, no build step, no external search service. Everything had to run in the browser from a static JSON file.

Step 1 — The Search Index JSON

A Python script walks every simulation directory, reads metadata from each index.html (title, description, category tags, keywords), and emits a compact search-index.json:

// search-index.json (abbreviated) [ { "id": "double-slit", "title": "Double-Slit Experiment", "desc": "Quantum wave-particle duality...", "cats": ["quantum", "optics"], "tags": ["interference", "wave", "photon"], "difficulty": "intermediate" }, ... ]

The full index for 250 simulations is 142 KB uncompressed, 18 KB gzipped — well under the browser's HTTP cache threshold for instant second-visit loads.

Step 2 — Inverted Index for Full-Text Search

A plain array scan of 250 items on every keystroke would be fast enough (250 objects is trivial for a CPU), but we wanted prefix matching and ranked results. An inverted index maps every word token to the list of document IDs containing it:

function buildInvertedIndex(docs) { const index = new Map(); // token → Set of doc IDs for (const doc of docs) { const tokens = tokenise(doc.title + ' ' + doc.desc + ' ' + doc.tags.join(' ')); for (const tok of tokens) { if (!index.has(tok)) index.set(tok, new Set()); index.get(tok).add(doc.id); } } return index; } function tokenise(text) { return text .toLowerCase() .replace(/[^a-z0-9 ]/g, ' ') .split(/\s+/) .filter(t => t.length > 2); // ignore stop words }

Step 3 — Trie for Prefix Matching

Users type "fluid" and expect "fluid dynamics", "SPH fluid", and "microfluid" to appear. An inverted index only matches exact tokens. A trie (prefix tree) solves this: each node represents one character; all paths from root to a leaf represent a complete token. Finding all words that start with "flu" is O(prefix_length) — constant with respect to library size.

class Trie { constructor() { this.root = {}; } insert(word, docId) { let node = this.root; for (const ch of word) { if (!node[ch]) node[ch] = { _ids: new Set() }; node = node[ch]; node._ids.add(docId); // propagate ID down every prefix node } } search(prefix) { let node = this.root; for (const ch of prefix) { if (!node[ch]) return new Set(); node = node[ch]; } return node._ids; // all docs containing a word starting with prefix } }

Step 4 — Multi-Tag Category Filtering

The filter panel uses bit-flag intersection. Each category is assigned a bit position; every simulation is represented as a bitmask of its categories. Multi-tag filtering is a single bitwise AND:

// Build bitmask for each sim at load time const CAT_BITS = { physics: 1, quantum: 2, biology: 4, chemistry: 8, economics: 16 }; sims.forEach(s => { s.mask = s.cats.reduce((acc, c) => acc | (CAT_BITS[c] ?? 0), 0); }); // Check if sim matches all selected filters function matchesFilter(sim, selectedMask) { return selectedMask === 0 || (sim.mask & selectedMask) === selectedMask; }

Step 5 — URL State Serialisation

The search query and active filters are serialised into the URL on every change, so users can bookmark and share filtered views: /?q=fluid&cat=physics,chemistry&diff=beginner. The state is read back on page load and the UI is restored without any page transition.

function pushState(query, cats, difficulty) { const params = new URLSearchParams(); if (query) params.set('q', query); if (cats.length) params.set('cat', cats.join(',')); if (difficulty) params.set('diff', difficulty); history.replaceState({}, '', '?' + params.toString()); }

Performance Results

Before
250 visible
All simulations always rendered — 250 DOM nodes, layout thrash on resize
After
<5 ms search
Trie lookup + bitmask filter + DOM update for matching sims only — imperceptible on any device
Before
~40 s
Time to find a specific simulation by scrolling for first-time visitors
After
~3 s
Time to find a simulation with 2-character prefix search and one category filter

Lessons Learned

Open architecture: The search index JSON is generated by a Python script that reads simulation metadata. Adding a new simulation automatically includes it in search results — no manual maintenance needed.