Devlog #17 — Building Fast Search & Category Filtering for 250+ Simulations

250+

simulations indexed

<5 ms

search latency

18 KB

search index gzipped

dependencies

The Problem: 250 Items in a Grid

The original homepage showed all simulations in a responsive grid — fine at 50, manageable at 100, unnavigable at 250. Analytics confirmed the problem: users arriving via search landed on specific simulation pages, but users landing on the homepage scrolled for a few seconds then left.

We needed search and filtering. The constraints: no server, no build step, no external search service. Everything had to run in the browser from a static JSON file.

Step 1 — The Search Index JSON

A Python script walks every simulation directory, reads metadata from each index.html (title, description, category tags, keywords), and emits a compact search-index.json:

          // search-index.json (abbreviated)
          [ {
          "id": "double-slit",
          "title":
          "Double-Slit Experiment",
          "desc":
          "Quantum wave-particle duality...",
          "cats": ["quantum",
          "optics"], "tags":
          ["interference",
          "wave", "photon"],
          "difficulty":
          "intermediate"
          }, ... ]
        

The full index for 250 simulations is 142 KB uncompressed, 18 KB gzipped — well under the browser's HTTP cache threshold for instant second-visit loads.

Step 2 — Inverted Index for Full-Text Search

A plain array scan of 250 items on every keystroke would be fast enough (250 objects is trivial for a CPU), but we wanted prefix matching and ranked results. An inverted index maps every word token to the list of document IDs containing it:

          function
          buildInvertedIndex(docs) {
          const index =
          new Map();
          // token → Set of doc IDs
          for (const doc
          of docs) {
          const tokens =
          tokenise(doc.title +
          ' ' + doc.desc +
          ' ' + doc.tags.join(' ')); for (const tok
          of tokens) {
          if (!index.has(tok)) index.set(tok,
          new Set()); index.get(tok).add(doc.id); } }
          return index; }

          function
          tokenise(text) {
          return text .toLowerCase() .replace(/[^a-z0-9 ]/g, ' ') .split(/\s+/) .filter(t => t.length > 2);
          // ignore stop words
          }
        

Step 3 — Trie for Prefix Matching

Users type "fluid" and expect "fluid dynamics", "SPH fluid", and "microfluid" to appear. An inverted index only matches exact tokens. A trie (prefix tree) solves this: each node represents one character; all paths from root to a leaf represent a complete token. Finding all words that start with "flu" is O(prefix_length) — constant with respect to library size.

          class Trie {
          constructor() {
          this.root = {}; }

          insert(word, docId) {
          let node = this.root;
          for (const ch
          of word) {
          if (!node[ch]) node[ch] = { _ids:
          new Set() }; node = node[ch];
          node._ids.add(docId);
          // propagate ID down every prefix node
          } }

          search(prefix) {
          let node = this.root;
          for (const ch
          of prefix) {
          if (!node[ch]) return
          new Set(); node = node[ch]; }
          return node._ids;
          // all docs containing a word starting with prefix
          } }
        

Step 4 — Multi-Tag Category Filtering

The filter panel uses bit-flag intersection. Each category is assigned a bit position; every simulation is represented as a bitmask of its categories. Multi-tag filtering is a single bitwise AND:

          // Build bitmask for each sim at load time
          const CAT_BITS = { physics:
          1, quantum: 2,
          biology: 4, chemistry:
          8, economics: 16 };
          sims.forEach(s => { s.mask = s.cats.reduce((acc, c) => acc | (CAT_BITS[c] ?? 0),
          0); });

          // Check if sim matches all selected filters
          function
          matchesFilter(sim, selectedMask) {
          return selectedMask ===
          0 || (sim.mask & selectedMask) ===
          selectedMask; }
        

Step 5 — URL State Serialisation

The search query and active filters are serialised into the URL on every change, so users can bookmark and share filtered views: /?q=fluid&cat=physics,chemistry&diff=beginner. The state is read back on page load and the UI is restored without any page transition.

          function
          pushState(query, cats, difficulty) {
          const params =
          new URLSearchParams();
          if (query) params.set('q', query); if (cats.length) params.set('cat', cats.join(','));
          if (difficulty) params.set('diff', difficulty); history.replaceState({},
          '', '?' +
          params.toString()); }
        

Performance Results

Before

250 visible

All simulations always rendered — 250 DOM nodes, layout thrash on resize

After

<5 ms search

Trie lookup + bitmask filter + DOM update for matching sims only — imperceptible on any device

Before

~40 s

Time to find a specific simulation by scrolling for first-time visitors

After

~3 s

Time to find a simulation with 2-character prefix search and one category filter

Lessons Learned

Don't reach for a library first. Lunr.js and Fuse.js are great but 20–40 KB gzipped. Our custom solution is 2 KB including the trie and was easier to tune.
Debounce the search input. Input fires on every keystroke; with a 120 ms debounce, we skip intermediate states entirely during fast typing.
Pre-compute bitmasks at load, not at search time. The index-build runs once on page load (~8 ms); every subsequent search is pure lookup.
Virtual scrolling would help at 500+ items — for now, CSS content-visibility: auto gives a free 60% render cost reduction for off-screen cards.

Open architecture: The search index JSON is generated by a Python script that reads simulation metadata. Adding a new simulation automatically includes it in search results — no manual maintenance needed.