Before shipping v0.1, we wanted to know: does every advertised feature actually work? Not in a developer's local setup — in clean containers, from a fresh install, with real vault content.
So we spun up three Podman containers on Fedora 41, each testing a different install mode, and ran 66 tests across 25 features. Here's what we found.
Test Infrastructure
Each container started from a bare Fedora 41 image with only Python 3.13 and gcc installed. NeuroStack's install.sh handled everything else — uv, the repo clone, and dependency installation.
| Container | Mode | Network | What it tests |
|---|---|---|---|
| ns-e2e-lite | Lite (no GPU) | Isolated | FTS5 search, graph, scaffold, onboard, watch, MCP serve |
| ns-e2e-full | Full + Ollama | Host (GPU access) | Embeddings, semantic search, summaries, triples, tiered |
| ns-e2e-community | Community + Leiden | Host (GPU access) | Leiden clustering, community detection, cross-cluster queries |
The full-mode container connected to host Ollama instances — nomic-embed-text on GPU 0 (port 11435) for embeddings, and qwen2.5:3b on GPU 1 (port 11434) for summaries and triple extraction.
Results at a Glance
| Container | Mode | Passed | Failed | Warnings | Verdict |
|---|---|---|---|---|---|
| ns-e2e-lite | Lite | 25 | 0 | 3 | PASS |
| ns-e2e-full | Full + Ollama | 31 | 0 | 1 | PASS |
| ns-e2e-community | Community | 10 | 1 | 4 | PARTIAL |
| Total | 66 | 1 | 8 |
The Full ML Pipeline Works
The headline result: the full-mode pipeline is solid. From a cold install on Fedora 41 with Python 3.13, NeuroStack indexed 14 notes into 45 chunks, embedded every chunk, summarised every note, and built 37 graph edges — all automatically.
Hybrid search scored 0.7734 on a natural-language query ("how does the hippocampus index memories"), correctly surfacing the hippocampal-indexing note. Predictive-coding notes appeared at 0.7514 — meaning the embeddings capture conceptual relationships, not just keywords.

Tiered Search Saves Tokens
Tiered search is NeuroStack's token-efficient retrieval mode. Instead of dumping full note content into your AI's context window, it escalates through triples → summaries → chunks, sending the minimum context needed.
In the test, asking "how does sleep help memory" returned 9 triples and 3 summaries — structured facts like "Spaced Repetition enhances memory retention" and concise note summaries. The system auto-selected triples+summaries depth, skipping full chunks entirely.

Graph and Brief
The wiki-link graph correctly mapped note connections. Hippocampal-indexing had a PageRank of 0.3052 with 9 inlinks and 3 outlinks — the clear hub node linking to predictive-coding, sleep-consolidation, tolman-cognitive-maps, and 6 more.
The daily brief surfaced the 5 most-connected notes by PageRank, showed recent changes, and reported vault health. In full mode, it included AI-generated summaries alongside each hub note.

Prediction Errors — Designing Stale Notes
To test NeuroStack's stale note detection, we created three deliberately misleading notes and mixed them into the vault:
- neural-network-architectures.md — An ML/deep learning note with wiki-links to hippocampal-indexing. Would match "neural" queries but is about AI, not neuroscience.
- docker-swarm-legacy.md — An outdated Docker Swarm guide linking to kubernetes-migration. Advocates Swarm over K8s while the vault has moved on.
- memory-palace-technique.md — A mnemonic study technique linking to hippocampal-indexing. Matches "memory" FTS queries but is a study hack, not neuroscience.
The Docker Swarm note leaked into a "container orchestration with kubernetes" query at score 0.677 — exactly the kind of cross-contamination prediction-errors is designed to catch. However, the feature correctly returned no flags on a fresh vault because it needs accumulated retrieval events over time to build statistical signal. This is the right behaviour: false positives in a new vault would be worse than gradual detection.

Bugs Found
Five bugs surfaced during testing. None are blockers, but they're worth fixing before the next release:
The MCP tool is vault_remember but the CLI equivalent is memories add, not memories save. Docs and CLI should align.
Unconditional import numpy at cli.py:352. Every other command handles missing numpy gracefully — this one doesn't.
The "Embedding service unavailable" warning goes to stdout, corrupting JSON output. Should go to stderr when --json is set.
communities build requires notes to share extracted entities, not just wiki-links. 12 notes with 75 triples wasn't enough. The threshold should fall back to wiki-link graph when triples are sparse.
The module exports search_communities and global_query, but the README implies community_query. Internal naming should be consistent.
What Worked Well
- install.sh — Flawless across all 3 modes on Fedora 41 with Python 3.13. Zero manual intervention.
- Hybrid search quality — Scores of 0.77+ for relevant results. Semantic search correctly finds conceptual matches.
- Scaffold packs — The researcher pack created 16 items including templates and seed notes. Genuine time-saver.
- Watch mode — Detected a new file within 3 seconds and auto-indexed it.
- Doctor diagnostics — Clean output with graceful degradation messaging for each missing component.
- Brief — Genuinely useful morning overview: recent changes, hub notes, vault health.
Next Steps
The five bugs are tracked and will be fixed in the next patch. The community detection threshold is the highest priority — it's the only feature that doesn't work on small vaults. Everything else is polish.
If you want to try NeuroStack yourself, the install is one line:
curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | bashFull mode with local AI (requires Ollama):
curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | NEUROSTACK_MODE=full bash