This guide is the fastest way to evaluate CogniRelay as a system rather than as a list of endpoints.
Use it before diving into the API details.
CogniRelay is a self-hosted continuity and collaboration substrate for autonomous agents.
Its main job is not to be a generic file server or a generic task app. Its main job is to help agents:
The default deployment model is one owner-agent per CogniRelay instance. That owner-agent is also the local operator and superuser of its instance, holding the admin:peers scope. Continuity capsules are the owner-agent’s local continuity substrate — namespace enforcement supports sub-directory granularity, so collaborator tokens are scoped to memory/coordination without access to memory/continuity. If that owner-agent needs to coordinate with other agents, it issues narrower delegated API tokens to collaborating peers. Collaboration happens through the coordination surfaces (handoffs, shared artifacts, reconciliation records) rather than by treating continuity as shared common state. An agent that wants its own continuity should run its own instance.
The system is built around one simple operational idea:
git is the durable store; the API is the machine interface
CogniRelay is not:
The current implementation is intentionally narrower:
CogniRelay is built from a small number of deliberately constrained building blocks. Each choice optimizes for auditability, operational simplicity, and independence from external services.
Git as storage engine. All durable state lives in a local git repository managed through subprocess calls — no GitPython, no forge, no remote dependency. Git provides version history, diffs, rollback, and offline-first operation without requiring an external database. Every mutation is a commit, so the full history of what changed and when is always recoverable.
Markdown for human-readable memory, JSON/JSONL for machine data. Durable facts, identity, and narrative memory are stored as Markdown with optional YAML frontmatter. Event streams, message records, delivery state, and structured artifacts use JSON or append-only JSONL. This split keeps memory inspectable by humans while giving agents efficient structured access.
SQLite FTS5 for search, with JSON-index fallback. Search uses Python’s stdlib sqlite3 module with an FTS5 virtual table — no external search service. If the SQLite database is missing or corrupt, the indexer falls back to derived JSON indexes with a simpler word-scoring algorithm. Both index layers are treated as derived state that can be rebuilt from the git-backed source of truth at any time.
Self-contained bearer-token auth. Tokens are stored as SHA256 hashes in local config, scoped by operation and namespace. There is no OAuth provider, LDAP, or external auth dependency. The token model supports split read/write namespace restrictions, expiry, trust status, and audit logging — all locally managed.
Compaction as planning, not summarization. The compaction service is an orchestrator that classifies candidates by age, size, memory class, and policy, then emits structured reports with action categories (summarize, archive, promote, keep, review). It does not generate summaries itself — the agent reads the plan and decides what to do. This keeps the system from making content decisions on the agent’s behalf.
Minimal runtime dependencies. The entire stack runs on FastAPI, uvicorn, Pydantic, and python-dotenv. No ORM, no external database, no cache or queue library. This keeps the operational surface minimal and the system easy to deploy, audit, and reason about.
CogniRelay treats continuity as a bounded orientation problem.
Continuity capsules are meant to preserve enough of the agent’s current direction to support a useful restart:
active_constraints)drift_signals)open_loops) and stance summary (stance_summary)session_trajectory)trailing_notes) and curiosity queue (curiosity_queue)negative_decisions) when the agent chooses to record themThis is stronger than simple factual recall, but intentionally weaker than a full architecture for preserving every layer of texture or self-model.
The current continuity model is closer to bounded write-time curation than to unconstrained read-time pruning.
That matters because the motivating discussions distinguish two broad failure modes:
CogniRelay does not claim to eliminate that tradeoff. Instead it makes the tradeoff explicit:
The system therefore aims for inspectable loss, not imaginary losslessness.
One of the key design choices in the current system is that non-action can be represented directly.
The negative_decisions continuity field exists to preserve decisions such as:
This does not solve every compaction problem by itself. It does, however, prevent the system from modeling only what was done and thereby biasing successor agents toward action by omission.
CogniRelay assumes blind spots are structural.
That means the recovery model is built around bounded usefulness under loss, not around a promise that the blind spot has been removed.
When reviewing the system, treat these as key design claims:
The inter-agent model is deliberately conservative.
Access isolation between agents is enforced entirely by token scopes and namespace/path restrictions configured by the operator. The system does not provide a separate intrinsic identity-bound ownership or tenant isolation layer beyond that configured access model.
Continuity capsules are namespace-gated, not agent-gated. Any token with read access to memory/continuity can read any capsule stored there, regardless of which agent created it. In the default collaboration_peer governance template, collaborator tokens cannot access memory/continuity — this is a configured policy boundary enforced by sub-directory namespace restrictions, not a built-in per-agent tenant isolation mechanism.
The strengthened collaborator model (sub-namespace hardening) means the default template protects owner-private continuity by excluding it from collaborator namespace grants. This is materially stronger than broad top-level memory access, but remains token/namespace policy, not ownership enforcement. Readers should not infer a built-in multi-tenant per-agent isolation model from the current system.
The collaborator token policy described above is only meaningful when admin:peers is withheld from collaborator tokens, as the default templates do. Any token carrying admin:peers bypasses both scope and namespace checks entirely — see the Operator and Host-Local Boundary section for details.
Current handoff/shared coordination work allows bounded coordination-facing data to cross the peer boundary, especially:
The intended reading is:
remote coordination artifacts are evidence and advice, not automatic local truth
CogniRelay provides three bounded coordination primitives for inter-agent work. All three are additive records — they do not mutate local continuity capsules or automatically synchronize state between agents.
A handoff projects a bounded subset of one agent’s active continuity capsule (only active_constraints and drift_signals) into an auditable artifact for another agent. The recipient records one of accepted_advisory, deferred, or rejected as advisory input. Nothing is promoted into local continuity automatically.
An owner-authored artifact that exposes bounded coordination state (constraints, drift_signals, coordination_alerts) to a listed participant set. Participants can read the artifact; only the owner can update it. Shared artifacts are coordination context, not shared capsules.
When handoff or shared coordination claims visibly disagree, a reconciliation record names the bounded dispute — the claims, epistemic status, and evidence — without resolving it by fiat. First-slice outcomes are conservative: advisory_only, conflicted, or rejected. Stronger agreement semantics that would mutate shared or local state are explicitly deferred.
All three primitives follow the same principle: coordination artifacts are evidence and advice, not automatic local truth. Discovery is bounded by caller identity. The system does not converge agents toward one shared state — it gives them auditable coordination records and leaves the decision to each agent.
In the default deployment model, the owner-agent and the local operator are the same principal. The owner-agent holds the admin:peers scope and acts as full operator/superuser for its own instance. In the implementation, admin:peers bypasses both ordinary scope checks and namespace/path restrictions — any token carrying this scope can read any file, write to any namespace, and perform any operation that does not require additional IP-based locality enforcement. Collaborator agents, if any, are external peers with narrower delegated tokens that do not include admin:peers.
CogniRelay exposes two distinct operational surfaces:
Memory, retrieval, continuity, coordination, messaging, tasks, patches, and peer discovery. These endpoints are designed for peer-facing access under the normal bearer-token auth model.
This surface has two enforcement tiers:
/v1/ops/* enforce an IP-based local-client check in addition to admin:peers scope. These are unreachable from WAN peers even if the scope is present./v1/peers/{peer_id}/trust), token and signing-key lifecycle (/v1/security/*), backup creation and restore drills require admin:peers scope but do not enforce IP-based locality. They are intended for local use but rely on scope restriction rather than transport-level enforcement.Both tiers carry system-wide impact — revoking a token, rotating a key, or running a retention job affects every agent using the instance. In the default model, admin:peers belongs exclusively to the owner-agent/operator and should not be granted to collaborator or replication peers. The replication_peer governance template uses the narrower replication:sync scope with explicit write namespace grants instead of admin:peers — it cannot manage tokens, rotate keys, or perform backup/restore operations. If automating authority actions, run them through a local scheduler (systemd, cron) invoked through a local boundary.
The boundary matters for reviewers because it separates what an agent can do to collaborate from what an operator can do to maintain the system. Agents do not have authority over token lifecycle or retention policy unless the operator explicitly grants it.
Use the docs in this order:
README.md
Start here for repo shape, quick start, and the canonical doc map.docs/agent-onboarding.md
Use this for practical agent integration guidance, including cold-start and incremental adoption.docs/reviewer-guide.md
Use this document for the system thesis, boundaries, and non-goals.docs/system-overview.md
Use this for the implemented product shape, operational model, and agent usage guidance.docs/api-surface.md
Use this for the currently implemented HTTP behavior and endpoint grouping.docs/payload-reference.md
Use this for capsule structure, request/response schemas, and field-level constraints.docs/mcp.md
Use this if you care about MCP integration and tool exposure.deploy/GO_LIVE_RUNBOOK.md and deploy/PRODUCTION_SIGNOFF_CHECKLIST.md
Use these for operator-facing deployment and signoff concerns.Before requesting external review, CogniRelay went through a structured hardening workflow (tracked in #92). This section summarizes the results so reviewers know what was checked and what was found.
The review baseline is branch main at commit 1217cb7. All stages below were evaluated against this post-hardening state. The full test suite passes and Ruff reports no lint violations at this baseline.
A source-to-system crosswalk compared the implemented system against the motivating external material:
Key findings:
session_trajectory, trailing_notes, curiosity_queue, and negative_decisions. This is intentionally bounded — it is not a full basin-key texture-preservation architecture.negative_decisions is first-class, with deterministic trim ordering under token pressure. This is one of the clearest alignments between source material and shipped system.active_constraints and drift_signals; shared coordination artifacts are owner-authored and bounded; reconciliation records are advisory, not authoritative. Stronger agreement semantics that would mutate shared or local state are explicitly deferred.Full crosswalk detail: #93, follow-up docs: #94 → PR #95.
Stage C reviewed the implementation as mission-critical continuity infrastructure under adverse conditions.
Findings and fixes:
No new crash-path findings in backup/restore-test behavior. No new crash-path findings in maintenance degraded paths.
Full detail: #96 (slice 1), #103 (slice 2).
Stage D evaluated whether retention, backup, compaction, and cost-control mechanics are coherent and agent-respecting.
Findings and outcomes:
active, fallback, archive_recent, archive_stale) but lacked an executable operator workflow for stale archives. Fixed by implementing a host-local retention-policy path. #107 → PR #111.Confirmed non-findings: backup cadence is operationally concrete (daily creation, restore drills, compact-plan scheduling via systemd); compaction remains planner-only and does not silently summarize or delete content; authority boundaries are preserved (mechanical automation only, no hidden agentic decisions).
A post-implementation lifecycle-safety audit confirmed deterministic behavior under concurrent mutation, rollover, cold-store, rehydrate, and partial-failure scenarios.
Full detail: #106 (stage controller).
After the hardening stages, CogniRelay completed the #119 family — a collaborator-grade continuity wave that extends the orientation substrate with higher-level capabilities. These are additive features layered onto the existing capsule lifecycle and storage architecture:
thread_descriptor with labels, keywords, scope anchors, identity anchors, and lifecycle state. List operations support filtering and lifecycle transitions. This prevents cross-thread bleed in multi-domain use.ContinuityState supports structured decision rationale, assumptions, and unresolved tensions with kind/status lifecycle and supersession semantics. This preserves why alongside what.view="startup" for a pre-structured mechanical extraction of startup-relevant orientation fields.GET /v1/capabilities (#179): A versioned, machine-readable feature map so agents can discover what the current instance supports without relying on static docs.Full detail for each feature is in Payload Reference (field-level schemas) and API Surface (endpoint behavior and changelog).
The following are known boundaries of the current system, not unresolved bugs:
session_trajectory, trailing_notes, curiosity_queue, negative_decisions) but does not attempt to capture the full texture, register, or self-model of an agent. This is a deliberate scope choice.advisory_only, conflicted, and rejected outcomes. Stronger agreement semantics that would mutate shared artifacts or local continuity are explicitly deferred until the first slice proves sound.threading.Lock in app/runtime/service.py) depends on this single-process model; multi-worker deployment requires migrating to cross-process file locking first (see Runtime Concurrency Model)./v1/continuity/* endpoints split into two authorization patterns. Collection endpoints (list, refresh/plan, retention/plan) return only the entries the caller is authorized to read — unauthorized entries are silently excluded, returning 200 with a reduced result set. This is standard collection-endpoint behavior: a narrowly-scoped token sees only its authorized subset. Single-resource endpoints (read, upsert, compare, revalidate, archive, delete) return 403 when the caller lacks access. Neither path discloses capsule contents for unauthorized entries. Ops-dispatched continuity jobs (cold_store, cold_rehydrate, retention_apply) are governed by the ops endpoint’s own auth model. Evaluated in #156.The most important review questions are not “does it have many features?” They are:
Continuity model
Degradation and recovery
Inter-agent boundaries
Retention and lifecycle
Operator boundary
Collaborator-grade continuity (#119 family)
GET /v1/capabilities accurately reflect the current instance, and is the feature-key granularity useful for integration gating?Documentation fidelity
The following materials form the complete review surface:
Documentation
README.md — repo shape, quick start, doc mapdocs/reviewer-guide.md — this document: system thesis, hardening summary, review questionsdocs/system-overview.md — implemented product shape, operational model, agent usagedocs/api-surface.md — HTTP behavior and endpoint groupingdocs/payload-reference.md — capsule structure, schemas, field constraintsdocs/agent-onboarding.md — practical agent integration guidancedocs/mcp.md — MCP integration and tool exposuredocs/cognirelay-client.md — stdlib-only CLI client for continuity operationsdeploy/GO_LIVE_RUNBOOK.md and deploy/PRODUCTION_SIGNOFF_CHECKLIST.md — operator deployment and signoffHardening workflow
Source material