CogniRelay

Reviewer Guide

This guide is the fastest way to evaluate CogniRelay as a system rather than as a list of endpoints.

Use it before diving into the API details.

What CogniRelay Is

CogniRelay is a self-hosted continuity and collaboration substrate for autonomous agents.

Its main job is not to be a generic file server or a generic task app. Its main job is to help agents:

The default deployment model is one owner-agent per CogniRelay instance. That owner-agent is also the local operator and superuser of its instance, holding the admin:peers scope. Continuity capsules are the owner-agent’s local continuity substrate — namespace enforcement supports sub-directory granularity, so collaborator tokens are scoped to memory/coordination without access to memory/continuity. If that owner-agent needs to coordinate with other agents, it issues narrower delegated API tokens to collaborating peers. Collaboration happens through the coordination surfaces (handoffs, shared artifacts, reconciliation records) rather than by treating continuity as shared common state. An agent that wants its own continuity should run its own instance.

The system is built around one simple operational idea:

git is the durable store; the API is the machine interface

What CogniRelay Is Not

CogniRelay is not:

The current implementation is intentionally narrower:

Why This Architecture

CogniRelay is built from a small number of deliberately constrained building blocks. Each choice optimizes for auditability, operational simplicity, and independence from external services.

Git as storage engine. All durable state lives in a local git repository managed through subprocess calls — no GitPython, no forge, no remote dependency. Git provides version history, diffs, rollback, and offline-first operation without requiring an external database. Every mutation is a commit, so the full history of what changed and when is always recoverable.

Markdown for human-readable memory, JSON/JSONL for machine data. Durable facts, identity, and narrative memory are stored as Markdown with optional YAML frontmatter. Event streams, message records, delivery state, and structured artifacts use JSON or append-only JSONL. This split keeps memory inspectable by humans while giving agents efficient structured access.

SQLite FTS5 for search, with JSON-index fallback. Search uses Python’s stdlib sqlite3 module with an FTS5 virtual table — no external search service. If the SQLite database is missing or corrupt, the indexer falls back to derived JSON indexes with a simpler word-scoring algorithm. Both index layers are treated as derived state that can be rebuilt from the git-backed source of truth at any time.

Self-contained bearer-token auth. Tokens are stored as SHA256 hashes in local config, scoped by operation and namespace. There is no OAuth provider, LDAP, or external auth dependency. The token model supports split read/write namespace restrictions, expiry, trust status, and audit logging — all locally managed.

Compaction as planning, not summarization. The compaction service is an orchestrator that classifies candidates by age, size, memory class, and policy, then emits structured reports with action categories (summarize, archive, promote, keep, review). It does not generate summaries itself — the agent reads the plan and decides what to do. This keeps the system from making content decisions on the agent’s behalf.

Minimal runtime dependencies. The entire stack runs on FastAPI, uvicorn, Pydantic, and python-dotenv. No ORM, no external database, no cache or queue library. This keeps the operational surface minimal and the system easy to deploy, audit, and reason about.

The Core Model

Bounded orientation preservation

CogniRelay treats continuity as a bounded orientation problem.

Continuity capsules are meant to preserve enough of the agent’s current direction to support a useful restart:

This is stronger than simple factual recall, but intentionally weaker than a full architecture for preserving every layer of texture or self-model.

Write-time curation rather than unlimited retention

The current continuity model is closer to bounded write-time curation than to unconstrained read-time pruning.

That matters because the motivating discussions distinguish two broad failure modes:

CogniRelay does not claim to eliminate that tradeoff. Instead it makes the tradeoff explicit:

The system therefore aims for inspectable loss, not imaginary losslessness.

Negative decisions are first-class enough to survive if recorded

One of the key design choices in the current system is that non-action can be represented directly.

The negative_decisions continuity field exists to preserve decisions such as:

This does not solve every compaction problem by itself. It does, however, prevent the system from modeling only what was done and thereby biasing successor agents toward action by omission.

Recovery Model

CogniRelay assumes blind spots are structural.

That means the recovery model is built around bounded usefulness under loss, not around a promise that the blind spot has been removed.

What the system tries to do

What the system does not claim

Practical reading for reviewers

When reviewing the system, treat these as key design claims:

Inter-Agent Authority Boundaries

The inter-agent model is deliberately conservative.

Access isolation model

Access isolation between agents is enforced entirely by token scopes and namespace/path restrictions configured by the operator. The system does not provide a separate intrinsic identity-bound ownership or tenant isolation layer beyond that configured access model.

Continuity capsules are namespace-gated, not agent-gated. Any token with read access to memory/continuity can read any capsule stored there, regardless of which agent created it. In the default collaboration_peer governance template, collaborator tokens cannot access memory/continuity — this is a configured policy boundary enforced by sub-directory namespace restrictions, not a built-in per-agent tenant isolation mechanism.

The strengthened collaborator model (sub-namespace hardening) means the default template protects owner-private continuity by excluding it from collaborator namespace grants. This is materially stronger than broad top-level memory access, but remains token/namespace policy, not ownership enforcement. Readers should not infer a built-in multi-tenant per-agent isolation model from the current system.

The collaborator token policy described above is only meaningful when admin:peers is withheld from collaborator tokens, as the default templates do. Any token carrying admin:peers bypasses both scope and namespace checks entirely — see the Operator and Host-Local Boundary section for details.

What crosses the boundary

Current handoff/shared coordination work allows bounded coordination-facing data to cross the peer boundary, especially:

What does not happen automatically

The intended reading is:

remote coordination artifacts are evidence and advice, not automatic local truth

Coordination Model

CogniRelay provides three bounded coordination primitives for inter-agent work. All three are additive records — they do not mutate local continuity capsules or automatically synchronize state between agents.

Handoffs

A handoff projects a bounded subset of one agent’s active continuity capsule (only active_constraints and drift_signals) into an auditable artifact for another agent. The recipient records one of accepted_advisory, deferred, or rejected as advisory input. Nothing is promoted into local continuity automatically.

Shared coordination artifacts

An owner-authored artifact that exposes bounded coordination state (constraints, drift_signals, coordination_alerts) to a listed participant set. Participants can read the artifact; only the owner can update it. Shared artifacts are coordination context, not shared capsules.

Reconciliation records

When handoff or shared coordination claims visibly disagree, a reconciliation record names the bounded dispute — the claims, epistemic status, and evidence — without resolving it by fiat. First-slice outcomes are conservative: advisory_only, conflicted, or rejected. Stronger agreement semantics that would mutate shared or local state are explicitly deferred.

What ties them together

All three primitives follow the same principle: coordination artifacts are evidence and advice, not automatic local truth. Discovery is bounded by caller identity. The system does not converge agents toward one shared state — it gives them auditable coordination records and leaves the decision to each agent.

Operator and Host-Local Boundary

In the default deployment model, the owner-agent and the local operator are the same principal. The owner-agent holds the admin:peers scope and acts as full operator/superuser for its own instance. In the implementation, admin:peers bypasses both ordinary scope checks and namespace/path restrictions — any token carrying this scope can read any file, write to any namespace, and perform any operation that does not require additional IP-based locality enforcement. Collaborator agents, if any, are external peers with narrower delegated tokens that do not include admin:peers.

CogniRelay exposes two distinct operational surfaces:

Agent-facing collaboration surface

Memory, retrieval, continuity, coordination, messaging, tasks, patches, and peer discovery. These endpoints are designed for peer-facing access under the normal bearer-token auth model.

Host-local authority surface

This surface has two enforcement tiers:

Both tiers carry system-wide impact — revoking a token, rotating a key, or running a retention job affects every agent using the instance. In the default model, admin:peers belongs exclusively to the owner-agent/operator and should not be granted to collaborator or replication peers. The replication_peer governance template uses the narrower replication:sync scope with explicit write namespace grants instead of admin:peers — it cannot manage tokens, rotate keys, or perform backup/restore operations. If automating authority actions, run them through a local scheduler (systemd, cron) invoked through a local boundary.

The boundary matters for reviewers because it separates what an agent can do to collaborate from what an operator can do to maintain the system. Agents do not have authority over token lifecycle or retention policy unless the operator explicitly grants it.

How To Read The Docs

Use the docs in this order:

  1. README.md Start here for repo shape, quick start, and the canonical doc map.
  2. docs/agent-onboarding.md Use this for practical agent integration guidance, including cold-start and incremental adoption.
  3. docs/reviewer-guide.md Use this document for the system thesis, boundaries, and non-goals.
  4. docs/system-overview.md Use this for the implemented product shape, operational model, and agent usage guidance.
  5. docs/api-surface.md Use this for the currently implemented HTTP behavior and endpoint grouping.
  6. docs/payload-reference.md Use this for capsule structure, request/response schemas, and field-level constraints.
  7. docs/mcp.md Use this if you care about MCP integration and tool exposure.
  8. deploy/GO_LIVE_RUNBOOK.md and deploy/PRODUCTION_SIGNOFF_CHECKLIST.md Use these for operator-facing deployment and signoff concerns.

Pre-Review Hardening Summary

Before requesting external review, CogniRelay went through a structured hardening workflow (tracked in #92). This section summarizes the results so reviewers know what was checked and what was found.

Review Baseline

The review baseline is branch main at commit 1217cb7. All stages below were evaluated against this post-hardening state. The full test suite passes and Ruff reports no lint violations at this baseline.

Scientific Crosswalk (Stage B)

A source-to-system crosswalk compared the implemented system against the motivating external material:

Key findings:

Full crosswalk detail: #93, follow-up docs: #94PR #95.

Robustness Findings and Resolutions (Stage C)

Stage C reviewed the implementation as mission-critical continuity infrastructure under adverse conditions.

Findings and fixes:

  1. Git index serialization (high): concurrent commits could interfere through the shared git index. Fixed by adding repository-level git mutation serialization. #98PR #101.
  2. Same-subject continuity locking (high): concurrent mutations to the same continuity subject could race through write/commit/rollback. Fixed by adding per-subject continuity mutation locking. #97PR #99.
  3. Rollback hardening (high): additional rollback edge cases discovered during the locking work. Fixed with broader mutation-path hardening. #100PR #102.
  4. Raw-scan performance cliff (high): degraded index fallback performed a full-repo sweep under missing/corrupt index conditions. Fixed by bounding the fallback scan. #104PR #105.

No new crash-path findings in backup/restore-test behavior. No new crash-path findings in maintenance degraded paths.

Full detail: #96 (slice 1), #103 (slice 2).

Retention and Lifecycle Findings and Resolutions (Stage D)

Stage D evaluated whether retention, backup, compaction, and cost-control mechanics are coherent and agent-respecting.

Findings and outcomes:

  1. Continuity retention policy: the system labeled retention states (active, fallback, archive_recent, archive_stale) but lacked an executable operator workflow for stale archives. Fixed by implementing a host-local retention-policy path. #107PR #111.
  2. Semi-cold storage mechanism: no implemented model for compressed/searchable low-priority storage. Fixed by implementing a semi-cold storage path with explicit rehydrate semantics. #108PR #110.
  3. Repo-wide lifecycle substrate: different namespaces had no common lifecycle architecture. Resolved by designing and implementing a shared lifecycle substrate with namespace-specific tuning. #109, tuning specs: #112PR #115, #113PR #117, #114PR #125.

Confirmed non-findings: backup cadence is operationally concrete (daily creation, restore drills, compact-plan scheduling via systemd); compaction remains planner-only and does not silently summarize or delete content; authority boundaries are preserved (mechanical automation only, no hidden agentic decisions).

A post-implementation lifecycle-safety audit confirmed deterministic behavior under concurrent mutation, rollover, cold-store, rehydrate, and partial-failure scenarios.

Full detail: #106 (stage controller).

Post-#119 Collaborator-Grade Continuity (Stages E–F)

After the hardening stages, CogniRelay completed the #119 family — a collaborator-grade continuity wave that extends the orientation substrate with higher-level capabilities. These are additive features layered onto the existing capsule lifecycle and storage architecture:

Full detail for each feature is in Payload Reference (field-level schemas) and API Surface (endpoint behavior and changelog).

Known Limitations and Intentional Deferrals

The following are known boundaries of the current system, not unresolved bugs:

What Reviewers Should Pressure-Test

The most important review questions are not “does it have many features?” They are:

Continuity model

Degradation and recovery

Inter-agent boundaries

Retention and lifecycle

Operator boundary

Collaborator-grade continuity (#119 family)

Documentation fidelity

Review Materials

The following materials form the complete review surface:

Documentation

Hardening workflow

Source material