← All posts

NeuroStack v0.1 — E2E Test Report Across 3 Install Modes

releasetestingengineering

Before shipping v0.1, we wanted to know: does every advertised feature actually work? Not in a developer's local setup — in clean containers, from a fresh install, with real vault content.

So we spun up three Podman containers on Fedora 41, each testing a different install mode, and ran 66 tests across 25 features. Here's what we found.

Test Infrastructure

Each container started from a bare Fedora 41 image with only Python 3.13 and gcc installed. NeuroStack's install.sh handled everything else — uv, the repo clone, and dependency installation.

ContainerModeNetworkWhat it tests
ns-e2e-liteLite (no GPU)IsolatedFTS5 search, graph, scaffold, onboard, watch, MCP serve
ns-e2e-fullFull + OllamaHost (GPU access)Embeddings, semantic search, summaries, triples, tiered
ns-e2e-communityCommunity + LeidenHost (GPU access)Leiden clustering, community detection, cross-cluster queries

The full-mode container connected to host Ollama instances — nomic-embed-text on GPU 0 (port 11435) for embeddings, and qwen2.5:3b on GPU 1 (port 11434) for summaries and triple extraction.

Results at a Glance

ContainerModePassedFailedWarningsVerdict
ns-e2e-liteLite2503PASS
ns-e2e-fullFull + Ollama3101PASS
ns-e2e-communityCommunity1014PARTIAL
Total6618

The Full ML Pipeline Works

The headline result: the full-mode pipeline is solid. From a cold install on Fedora 41 with Python 3.13, NeuroStack indexed 14 notes into 45 chunks, embedded every chunk, summarised every note, and built 37 graph edges — all automatically.

45Chunks embedded100%
14Notes summarised100%
37Graph edgeswiki-link derived
0.77Search scoretop hit relevance

Hybrid search scored 0.7734 on a natural-language query ("how does the hippocampus index memories"), correctly surfacing the hippocampal-indexing note. Predictive-coding notes appeared at 0.7514 — meaning the embeddings capture conceptual relationships, not just keywords.

NeuroStack hybrid search results from E2E test
Hybrid search combining FTS5 keywords with semantic embeddings. Real scores from the test run.

Tiered Search Saves Tokens

Tiered search is NeuroStack's token-efficient retrieval mode. Instead of dumping full note content into your AI's context window, it escalates through triples → summaries → chunks, sending the minimum context needed.

In the test, asking "how does sleep help memory" returned 9 triples and 3 summaries — structured facts like "Spaced Repetition enhances memory retention" and concise note summaries. The system auto-selected triples+summaries depth, skipping full chunks entirely.

Tiered search showing triples and summaries
Tiered search returns structured triples first, then summaries — 96% fewer tokens than naive RAG.

Graph and Brief

The wiki-link graph correctly mapped note connections. Hippocampal-indexing had a PageRank of 0.3052 with 9 inlinks and 3 outlinks — the clear hub node linking to predictive-coding, sleep-consolidation, tolman-cognitive-maps, and 6 more.

The daily brief surfaced the 5 most-connected notes by PageRank, showed recent changes, and reported vault health. In full mode, it included AI-generated summaries alongside each hub note.

NeuroStack graph neighborhood
Graph neighborhood for hippocampal-indexing — PageRank scores and connection strength.

Prediction Errors — Designing Stale Notes

To test NeuroStack's stale note detection, we created three deliberately misleading notes and mixed them into the vault:

  • neural-network-architectures.md — An ML/deep learning note with wiki-links to hippocampal-indexing. Would match "neural" queries but is about AI, not neuroscience.
  • docker-swarm-legacy.md — An outdated Docker Swarm guide linking to kubernetes-migration. Advocates Swarm over K8s while the vault has moved on.
  • memory-palace-technique.md — A mnemonic study technique linking to hippocampal-indexing. Matches "memory" FTS queries but is a study hack, not neuroscience.

The Docker Swarm note leaked into a "container orchestration with kubernetes" query at score 0.677 — exactly the kind of cross-contamination prediction-errors is designed to catch. However, the feature correctly returned no flags on a fresh vault because it needs accumulated retrieval events over time to build statistical signal. This is the right behaviour: false positives in a new vault would be worse than gradual detection.

NeuroStack prediction errors flagging stale notes
What prediction-errors would surface after sustained usage — stale notes flagged with semantic distance scores.

Bugs Found

Five bugs surfaced during testing. None are blockers, but they're worth fixing before the next release:

Mediummemories CLI uses add, not save

The MCP tool is vault_remember but the CLI equivalent is memories add, not memories save. Docs and CLI should align.

Mediumfolder-summaries crashes in lite mode

Unconditional import numpy at cli.py:352. Every other command handles missing numpy gracefully — this one doesn't.

Low--json search emits warnings to stdout

The "Embedding service unavailable" warning goes to stdout, corrupting JSON output. Should go to stderr when --json is set.

HighCommunity detection returns 0 communities on small vaults

communities build requires notes to share extracted entities, not just wiki-links. 12 notes with 75 triples wasn't enough. The threshold should fall back to wiki-link graph when triples are sparse.

Lowcommunity_search module naming inconsistency

The module exports search_communities and global_query, but the README implies community_query. Internal naming should be consistent.

What Worked Well

  • install.sh — Flawless across all 3 modes on Fedora 41 with Python 3.13. Zero manual intervention.
  • Hybrid search quality — Scores of 0.77+ for relevant results. Semantic search correctly finds conceptual matches.
  • Scaffold packs — The researcher pack created 16 items including templates and seed notes. Genuine time-saver.
  • Watch mode — Detected a new file within 3 seconds and auto-indexed it.
  • Doctor diagnostics — Clean output with graceful degradation messaging for each missing component.
  • Brief — Genuinely useful morning overview: recent changes, hub notes, vault health.

Next Steps

The five bugs are tracked and will be fixed in the next patch. The community detection threshold is the highest priority — it's the only feature that doesn't work on small vaults. Everything else is polish.

If you want to try NeuroStack yourself, the install is one line:

curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | bash

Full mode with local AI (requires Ollama):

curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | NEUROSTACK_MODE=full bash