CogniRelay

System Overview

Purpose

CogniRelay is a self-hosted collaboration and memory service for autonomous agents. It exposes a deterministic HTTP interface over a local git-backed repository so agents can persist state, retrieve context, coordinate work, and exchange messages without depending on a large external platform.

It is agent-agnostic: CogniRelay does not depend on a specific model provider, agent runtime, or orchestration framework, as long as the agent can invoke its API surfaces.

The core design principle is simple:

git is the storage engine; the API is the machine interface

This system should be read as a bounded continuity and orientation substrate. It aims to preserve enough state for useful continuation and recovery, while making degradation, fallback, and authority boundaries explicit rather than pretending persistence is lossless.

Practical Application Areas

CogniRelay is most useful in environments where agent work spans multiple sessions, interruptions are routine, and continuity must be recoverable and bounded rather than assumed.

Software engineering

Research and analysis

Operations and internal tooling

Multi-agent collaboration

Customer-facing and service workflows

Education, tutoring, and advisory contexts

In all these areas the common thread is that interruptions — session resets, context-window compaction, handoffs between agents or humans — are structural, not exceptional. CogniRelay makes the cost of those interruptions explicit and recoverable rather than silent and cumulative.

What the current system provides for these use cases

The application areas above are grounded in capabilities the system currently implements:

Research and Evaluation Value

CogniRelay is also a useful artifact for studying questions about agent continuity, recovery, and long-horizon collaboration. It is not a formal academic project, but it implements enough of a concrete continuity substrate that researchers and evaluators can use it as a testbed for empirical work.

For external experiments, third-party usage notes, and public case studies tied to CogniRelay, see External References and Case Studies.

Agent continuity and session-boundary recovery

The system’s explicit capsule lifecycle (active → fallback → archive → cold storage), deterministic trust signals, and structured degradation paths provide concrete surfaces for measuring:

Human-AI interaction and trust

The trust-signaling surface (recency, completeness, integrity, scope match) and the distinction between explicit orientation and implicit inference create testable questions:

Evaluation and benchmarking

The deterministic nature of CogniRelay’s retrieval, ranking, and trust-signaling paths makes them amenable to controlled evaluation:

Interpretability and memory structure

CogniRelay externalizes agent orientation into inspectable, structured artifacts rather than leaving it implicit in model weights or conversation history:

Distributed and multi-agent continuity

The owner-per-instance deployment model, delegated token scoping, and bounded coordination primitives provide infrastructure for studying:

Digital identity and continuity

CogniRelay’s model — where an agent’s orientation is externalized into durable, bounded, inspectable artifacts that survive context-window resets — touches questions about:

These are open questions. CogniRelay does not claim to answer them, but it provides a concrete, operational system against which they can be empirically investigated.

Default Deployment Topology

The default deployment is one owner-agent per CogniRelay instance.

The system should not be read as a peer-equal shared-instance platform. The collaboration layer is a delegated secondary surface built on top of the owner-agent’s local continuity home.

Optional operator UI

The shipped operator UI is an optional local-operator observability surface mounted under /ui.

The supported posture keeps COGNIRELAY_UI_REQUIRE_LOCALHOST=true, so /ui remains a loopback-scoped operator surface rather than a normal remotely exposed web app. The schedule page is list-only inspection backed by the shipped schedule list service; schedule mutation pages, schedule SSE/live updates, background scheduler loops, non-local auth/session models, mutation actions, standalone archive/cold maintenance consoles, WebSockets, and broader reactive UI behavior are deferred to future explicit issues rather than implied by the current deployment model.

Access isolation between agents is enforced entirely by token scopes and namespace/path restrictions. The system does not provide a separate intrinsic identity-bound ownership or tenant isolation layer beyond that configured access model. Any token with read access to memory/continuity can read any capsule in that namespace — capsule privacy depends on the operator not granting that access to collaborator tokens. In the default collaboration_peer template this access is excluded, which protects owner-private continuity as configured policy.

Runtime Concurrency Model

The default deployment runs a single uvicorn worker process (no --workers flag). This is intentional: the rate-limit state protection in app/runtime/service.py uses a threading.Lock to serialize read-modify-write cycles on logs/rate_limit_state.json. This lock is correct within a single process but does not protect across OS processes.

If the deployment model changes to multiple uvicorn workers, the rate-limit lock must be replaced with a cross-process mechanism. The recommended strategy is fcntl.flock on a dedicated lockfile, following the pattern already established by:

The existing lock-ordering rule applies: the rate-limit lock must remain the innermost lock in any acquisition chain.

Do not add --workers to the uvicorn command without completing this migration.

Architecture

CogniRelay combines a small number of building blocks:

Current Capability Areas

Memory and storage

Indexing and retrieval

Peer collaboration

Shared work coordination

Security and governance

Recovery and operations

System Models

Continuity model

CogniRelay treats continuity as a bounded orientation-preservation problem, not as total-fidelity persistence.

Continuity capsules preserve bounded working state across resets: active constraints (active_constraints), drift signals (drift_signals), open loops (open_loops), stance summary (stance_summary), session trajectory (session_trajectory), and optional lower-commitment fields such as trailing notes (trailing_notes), curiosity queue (curiosity_queue), and negative decisions (negative_decisions). The model uses write-time curation rather than unlimited retention — payloads are bounded, optional fields have a deterministic trim order under token pressure, and what is present, omitted, or archived is always explicit.

Continuity artifacts move through four tiers:

Retention planning and cold-store/rehydrate operations are explicit and operator-visible. The system aims for inspectable loss, not imaginary losslessness.

Trust signals

When a continuity capsule is returned through POST /v1/continuity/read or POST /v1/context/retrieve, the response includes a trust_signals block — an objective, mechanical trust assessment that the consuming agent can use to calibrate how much weight it places on the returned orientation data.

Trust signals are not heuristic confidence scores, AI-generated quality ratings, or probabilistic estimates. Every field is deterministically derived from data already present on the capsule or computed during retrieval. The same capsule at the same instant always produces the same signals. There is no model inference, no learned weighting, and no hidden state.

The four dimensions are:

On the context-retrieval path, trust signals participate in token-budget accounting. When the full shape would consume too much of the capsule’s allocation, a compact form is emitted instead (phase, orientation adequacy, trimmed flag, source state, health status, exact match — no ages or field lists). If even the compact form cannot fit, trust signals are null. This degradation is deterministic and surfaced via recovery_warnings.

An aggregate trust_signals block at the continuity_state level summarises the worst-case across all per-capsule signals: worst phase, oldest ages, any-fallback/any-degraded flags, and selector coverage counts. It handles mixed full/compact per-capsule shapes.

Trust signals tell the consumer what the system mechanically knows about the capsule’s state. They do not tell the consumer what to do about it — that decision belongs to the agent.

For the full field-level structure including compact forms, aggregate shapes, and nullability rules, see Payload Reference.

Post-#119 continuity enhancements

The collaborator-grade continuity wave (#119 family) added several capabilities layered onto the existing continuity substrate. Each extends the system’s orientation-preservation model without changing the base capsule lifecycle or storage architecture.

Thread identity and scope boundaries (#120). Capsules can now carry a thread_descriptor with a label, keywords, scope anchors, identity anchors, and a lifecycle state (active/suspended/concluded/superseded). This gives agents deterministic thread-level scoping so unrelated threads do not bleed into each other. List operations support filtering by lifecycle, scope anchor, keyword, label, and identity anchor. Lifecycle transitions are atomic with upsert. See Payload Reference for the model.

Salience ranking (#123). List and context-retrieve paths now support deterministic multi-signal salience sorting that surfaces the most decision-relevant capsules first. The sort key combines lifecycle rank, health rank, freshness phase, resume adequacy, verification strength, and recency — all derived from existing capsule state at retrieval time, with no stored ranking metadata. See Payload Reference for the sort key and response structure.

Stable preferences (#124). User and peer capsules can carry up to 12 stable preferences — explicit, user-stated standing instructions that persist across unrelated threads (e.g., “always use metric units”, “UTC+2 timezone”). Distinct from the agent’s inferred relationship_model. See Payload Reference for the model.

Rationale entries (#122). ContinuityState now supports up to 6 structured rationale entries capturing decision reasoning, assumptions, and unresolved tensions with a kind/status lifecycle and supersession semantics. This preserves why alongside what. See Payload Reference for the model.

Startup view (#165). POST /v1/continuity/read accepts view="startup" to return a pre-structured startup_summary with recovery, orientation, and context tiers alongside the unchanged full capsule. This is a mechanical extraction — no additional I/O. See Payload Reference for the response shape.

Session-end snapshot (#167). POST /v1/continuity/upsert accepts a session_end_snapshot that merges fresh startup-critical fields into the base capsule before persistence, reducing caller burden at session end. See Payload Reference for the merge algorithm.

GET /v1/capabilities (#179). A versioned, machine-readable feature map that allows agents to discover what the current instance supports before building integration logic. It covers the continuity enhancements above plus graph context, schedule reminders, coordination, messaging, peers, and discovery surfaces. See API Surface for the endpoint contract.

Mechanical Assistance and Agent Authorship

CogniRelay provides bounded mechanical assistance for continuity maintenance, but does not generate, infer, or synthesize semantic content. The division is strict: CogniRelay handles structural operations deterministically; agents remain solely responsible for meaning-bearing content.

What CogniRelay Handles Mechanically
Capability Surface What the system does
Preserve-by-default field retention POST /v1/continuity/upsert with merge_mode="preserve" Carries forward omitted fields from the stored capsule so agents can update a subset without re-sending the full capsule.
Bounded partial list updates POST /v1/continuity/patch Appends, removes, or replaces individual items in list fields atomically without rewriting the full list.
Standalone lifecycle transitions POST /v1/continuity/lifecycle Transitions thread_descriptor.lifecycle without a full capsule upsert.
Write-path normalization All continuity write endpoints Deduplicates, trims, and normalizes fields deterministically; reports what fired via normalizations_applied.
Fallback snapshot refresh All continuity write paths (upsert, patch, lifecycle, revalidate) Refreshes the last-known-good fallback snapshot after each successful active write.
Trust signal computation Read and retrieve paths Derives recency, completeness, integrity, and scope-match signals mechanically from stored capsule state.
Deterministic trimming Read and retrieve paths under token budget Trims lower-priority fields in a fixed order to fit the token budget; reports what was trimmed.
What Agents Must Author Explicitly

All semantic content — the meaning-bearing orientation that makes a capsule useful — is authored by the agent. CogniRelay stores, merges, and retrieves it but never generates it.

Content Why it requires agent authorship
stance_summary Captures the agent’s current analytical or operational position in its own terms.
source (agent identity, update reason) Only the agent knows who it is and why it is writing.
confidence Only the agent can assess its own certainty.
top_priorities, active_concerns, active_constraints Semantic judgments about what matters and what limits apply.
open_loops, drift_signals The agent identifies what is unresolved and what has shifted.
rationale_entries Structured decision reasoning — why the agent chose what it chose.
stable_preferences Explicit standing instructions the agent or user provides.
negative_decisions What the agent deliberately chose not to do.
working_hypotheses, long_horizon_commitments Speculative or durable analytical content.
session_trajectory, trailing_notes, curiosity_queue Session-specific direction, low-commitment observations, and open questions.
relationship_model The agent’s inferred model of the user or peer relationship.
Thread/task labels, keywords, scope anchors, identity anchors Semantic identity of the thread or task.

Capsule-level structural fields — attention_policy, freshness, canonical_sources, metadata, stable_preferences, and thread_descriptor — are also agent-authored and preserve-eligible (omitted in preserve mode, they are carried forward from the stored capsule). Other capsule-level fields (verification_kind, verification_state, capsule_health) are not preserve-eligible and must be provided explicitly when needed. The continuity-state-level retrieval_hints is similarly agent-authored. CogniRelay stores all of these but never generates or infers their values.

The system never infers, summarizes, or generates any of these fields. When an agent omits a field in preserve mode, CogniRelay carries forward the previously stored value — it does not fill in a new one.

Examples

Preserve-mode upsert — update stance and priorities, carry forward everything else:

{
  "subject_kind": "thread", "subject_id": "refactor-auth",
  "merge_mode": "preserve",
  "capsule": {
    "subject_kind": "thread", "subject_id": "refactor-auth",
    "updated_at": "2026-03-29T10:00:00Z",
    "verified_at": "2026-03-29T10:00:00Z",
    "source": {"producer": "coder-1", "update_reason": "manual"},
    "confidence": {"continuity": 0.9, "relationship_model": 0.8},
    "continuity": {
      "stance_summary": "Auth module extracted; integration tests next.",
      "top_priorities": ["Write integration tests for new auth service"],
      "active_concerns": [],
      "active_constraints": [],
      "open_loops": [],
      "drift_signals": []
    }
  }
}

Required list fields sent as [] signal “preserve the stored value” in preserve mode. Optional fields omitted entirely are also preserved. Capsule-level fields (stable_preferences, attention_policy, freshness, etc.) that are absent from the request are carried forward from the stored capsule. See Preserve-by-default merge for the full field-intent rules.

Patch — append one open loop without rewriting the list:

{
  "subject_kind": "thread", "subject_id": "refactor-auth",
  "updated_at": "2026-03-29T10:05:00Z",
  "operations": [
    {"target": "continuity.open_loops", "action": "append", "value": "Verify token rotation under new auth flow"}
  ]
}

Lifecycle transition — conclude a thread without a full upsert:

{
  "subject_kind": "thread", "subject_id": "refactor-auth",
  "transition": "conclude",
  "updated_at": "2026-03-29T11:00:00Z"
}

For field-level schemas and constraints, see Payload Reference.

Coordination model

CogniRelay provides three bounded coordination primitives. All are additive records that do not mutate local continuity capsules or automatically synchronize state between agents.

Discovery for all three primitives is bounded by caller identity unless the caller is an admin.

Degradation and recovery model

CogniRelay assumes blind spots are structural and optimizes for bounded usefulness under loss rather than claiming seamless recovery.

Key degradation behaviors:

Operational Boundary

There are two distinct surfaces:

Host-local ops endpoints are intended for loopback or other local trust boundaries, not WAN peer access. In the default model, host-local authority actions are performed by the owner-agent in its operator role. The /v1/ops/* endpoints enforce dual-layer access control (both admin:peers scope and IP-based locality); trust, token, and signing-key lifecycle endpoints require admin:peers scope but do not enforce IP locality. Collaborator peers should not have access to either surface.

Repository Shape

The runtime repo under data_repo/ is organized around durable memory and collaboration records:

Agent Usage

Startup sequence

For a practical onboarding walkthrough covering both cold-start and incremental integration, see Agent Onboarding. For the hook-based integration pattern summary, see README: Agent Integration Patterns.

For an agent cold start, the full recommended sequence is:

  1. GET /v1/capabilities (optional — confirm which features the instance supports before building integration logic; see API Surface)
  2. GET /v1/discovery
  3. GET /v1/manifest
  4. GET /v1/contracts
  5. GET /v1/governance/policy
  6. GET /health
  7. POST /v1/index/rebuild-incremental when writes occurred since the last cycle
  8. POST /v1/context/retrieve for the active task
  9. GET /v1/tasks/query for shared planning state
  10. GET /v1/messages/pending for tracked delivery state
  11. GET /v1/metrics for backlog, check, and replication health
  12. POST /v1/context/snapshot when reproducible continuation context is needed

If the runtime prefers MCP-style JSON-RPC, the canonical slice-2 bootstrap sequence is GET /.well-known/mcp.json, then POST /v1/mcp for initialize with required protocolVersion. notifications/initialized remains accepted as an optional notification-only compatibility call.

After initialize succeeds, normal MCP usage may proceed with methods such as tools/list and tools/call.

For the complete MCP integration notes, including what is and is not mirrored through the tool catalog, see docs/mcp.md.

Write behavior

Retrieval behavior

Indexing and compaction guidance

Peer and token guidance

Token role access matrix

The following matrix summarizes what each token role can access. The owner token is the default token for the agent running the instance. The governance policy exposes collaboration_peer and replication_peer as baseline templates for issued tokens.

Capability Owner (admin:peers) collaboration_peer replication_peer
Read continuity capsules (memory/continuity) Yes No Yes (wildcard read)
Write continuity capsules Yes No Yes (via memory write namespace)
Read core/episodic memory (memory/core, memory/episodic) Yes No Yes (wildcard read)
Read coordination artifacts (memory/coordination) Yes Yes Yes
Write coordination artifacts (requires write:projects) Yes Yes No (write:projects not granted)
Read tasks (tasks) Yes Yes Yes
Write tasks (requires write:projects) Yes Yes No (write:projects not granted)
Read messages (messages) Yes Yes Yes
Write/send messages Yes Yes Yes
Search and index Yes Yes No (search not granted)
Manage tokens (issue/revoke/rotate) Yes No No
Manage peer trust Yes No No
Rotate signing keys Yes No No
Replication sync (pull/push) Yes No Yes
Run ops jobs (/v1/ops/*, requires localhost) Yes No No
Risk if token is leaked Total compromise Coordination, task, and message exposure (no continuity/admin access) Read-all data access; writes scoped to replication prefixes; no administrative capability

Owner “Yes” entries in capability rows reflect admin:peers superuser bypass semantics — admin:peers bypasses both scope checks and namespace/path restrictions, so the owner token passes every authorization gate. The replication_peer template uses the dedicated replication:sync scope with wildcard read namespaces and write namespaces explicitly listing replication-eligible prefixes. It does not carry admin:peers and cannot perform administrative operations.

Operators can issue custom tokens with any combination of scopes and namespace restrictions. The templates above are baselines, not the only options.

Host-local authority boundary

Treat the following as host-local authority actions rather than normal remote peer operations:

If automated, run these through a local scheduler such as systemd or cron and invoke the service through a local boundary such as 127.0.0.1 or a Unix socket.

Failure handling and observability

How To Navigate The Docs