14 March 20268 minRaphael Southall

NeuroStack v0.1 — E2E Test Report Across 3 Install Modes

Name: NeuroStack
Author: Raphael Southall

releasetestingengineering

Before shipping v0.1, we wanted to know: does every advertised feature actually work? Not in a developer's local setup — in clean containers, from a fresh install, with real vault content.

So we spun up three Podman containers on Fedora 41, each testing a different install mode, and ran 66 tests across 25 features. Here's what we found.

Test Infrastructure

Each container started from a bare Fedora 41 image with only Python 3.13 and gcc installed. NeuroStack's install.sh handled everything else — uv, the repo clone, and dependency installation.

Container	Mode	Network	What it tests
ns-e2e-lite	Lite (no GPU)	Isolated	FTS5 search, graph, scaffold, onboard, watch, MCP serve
ns-e2e-full	Full + Ollama	Host (GPU access)	Embeddings, semantic search, summaries, triples, tiered
ns-e2e-community	Community + Leiden	Host (GPU access)	Leiden clustering, community detection, cross-cluster queries

The full-mode container connected to host Ollama instances — nomic-embed-text on GPU 0 (port 11435) for embeddings, and qwen2.5:3b on GPU 1 (port 11434) for summaries and triple extraction.

Results at a Glance

Container	Mode	Passed	Failed	Warnings	Verdict
ns-e2e-lite	Lite	25	0	3	PASS
ns-e2e-full	Full + Ollama	31	0	1	PASS
ns-e2e-community	Community	10	1	4	PARTIAL
Total		66	1	8

The Full ML Pipeline Works

The headline result: the full-mode pipeline is solid. From a cold install on Fedora 41 with Python 3.13, NeuroStack indexed 14 notes into 45 chunks, embedded every chunk, summarised every note, and built 37 graph edges — all automatically.

45Chunks embedded100%

14Notes summarised100%

37Graph edgeswiki-link derived

0.77Search scoretop hit relevance

Hybrid search scored 0.7734 on a natural-language query ("how does the hippocampus index memories"), correctly surfacing the hippocampal-indexing note. Predictive-coding notes appeared at 0.7514 — meaning the embeddings capture conceptual relationships, not just keywords.

NeuroStack hybrid search results from E2E test — Hybrid search combining FTS5 keywords with semantic embeddings. Real scores from the test run.

Tiered Search Saves Tokens

Tiered search is NeuroStack's token-efficient retrieval mode. Instead of dumping full note content into your AI's context window, it escalates through triples → summaries → chunks, sending the minimum context needed.

In the test, asking "how does sleep help memory" returned 9 triples and 3 summaries — structured facts like "Spaced Repetition enhances memory retention" and concise note summaries. The system auto-selected triples+summaries depth, skipping full chunks entirely.

Tiered search showing triples and summaries — Tiered search returns structured triples first, then summaries — 96% fewer tokens than naive RAG.

Graph and Brief

The wiki-link graph correctly mapped note connections. Hippocampal-indexing had a PageRank of 0.3052 with 9 inlinks and 3 outlinks — the clear hub node linking to predictive-coding, sleep-consolidation, tolman-cognitive-maps, and 6 more.

The daily brief surfaced the 5 most-connected notes by PageRank, showed recent changes, and reported vault health. In full mode, it included AI-generated summaries alongside each hub note.

NeuroStack graph neighborhood — Graph neighborhood for hippocampal-indexing — PageRank scores and connection strength.

Prediction Errors — Designing Stale Notes

To test NeuroStack's stale note detection, we created three deliberately misleading notes and mixed them into the vault:

neural-network-architectures.md — An ML/deep learning note with wiki-links to hippocampal-indexing. Would match "neural" queries but is about AI, not neuroscience.
docker-swarm-legacy.md — An outdated Docker Swarm guide linking to kubernetes-migration. Advocates Swarm over K8s while the vault has moved on.
memory-palace-technique.md — A mnemonic study technique linking to hippocampal-indexing. Matches "memory" FTS queries but is a study hack, not neuroscience.

The Docker Swarm note leaked into a "container orchestration with kubernetes" query at score 0.677 — exactly the kind of cross-contamination prediction-errors is designed to catch. However, the feature correctly returned no flags on a fresh vault because it needs accumulated retrieval events over time to build statistical signal. This is the right behaviour: false positives in a new vault would be worse than gradual detection.

NeuroStack prediction errors flagging stale notes — What prediction-errors would surface after sustained usage — stale notes flagged with semantic distance scores.

Bugs Found

Five bugs surfaced during testing. None are blockers, but they're worth fixing before the next release:

Mediummemories CLI uses add, not save

The MCP tool is vault_remember but the CLI equivalent is memories add, not memories save. Docs and CLI should align.

Mediumfolder-summaries crashes in lite mode

Unconditional import numpy at cli.py:352. Every other command handles missing numpy gracefully — this one doesn't.

Low--json search emits warnings to stdout

The "Embedding service unavailable" warning goes to stdout, corrupting JSON output. Should go to stderr when --json is set.

HighCommunity detection returns 0 communities on small vaults

communities build requires notes to share extracted entities, not just wiki-links. 12 notes with 75 triples wasn't enough. The threshold should fall back to wiki-link graph when triples are sparse.

Lowcommunity_search module naming inconsistency

The module exports search_communities and global_query, but the README implies community_query. Internal naming should be consistent.

What Worked Well

install.sh — Flawless across all 3 modes on Fedora 41 with Python 3.13. Zero manual intervention.
Hybrid search quality — Scores of 0.77+ for relevant results. Semantic search correctly finds conceptual matches.
Scaffold packs — The researcher pack created 16 items including templates and seed notes. Genuine time-saver.
Watch mode — Detected a new file within 3 seconds and auto-indexed it.
Doctor diagnostics — Clean output with graceful degradation messaging for each missing component.
Brief — Genuinely useful morning overview: recent changes, hub notes, vault health.

Next Steps

The five bugs are tracked and will be fixed in the next patch. The community detection threshold is the highest priority — it's the only feature that doesn't work on small vaults. Everything else is polish.

If you want to try NeuroStack yourself, the install is one line:

curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | bash

Full mode with local AI (requires Ollama):

curl -fsSL https://raw.githubusercontent.com/raphasouthall/neurostack/main/install.sh | NEUROSTACK_MODE=full bash