Skip to content

Apollo — System-Wide Observation, Learning, and Guidance Layer

Purpose

Apollo is the reasoning and memory layer over the platform's LLM activity. It sits inside oracle, observes every LLM and tool interaction across a three-layer system, distills durable artifacts from those observations, and exposes learned guidance back to the layers that need it.

Apollo is an observer, learner, and advisor. It does not execute workflows, does not call tools, does not retry failed requests, and does not interrupt live LLM calls. Iteration is driven by layer 1 (the front-end prompt/schema generator); Apollo's role is to make each successive iteration better informed than the last.

Apollo has its own LLM, its own memory, and an autonomous curator that maintains its own artifacts — but empowerment is strictly bounded to Apollo's internal state. Apollo cannot change auth, guardrails, token scopes, or user data.

Three-Layer Context

┌──────────────────────────────────────────────────────────────┐
│  Layer 1: Front-end                                          │
│    - generates prompts + schemas for requests                │
│    - consumes Apollo's guidance to shape future prompts      │
│    - decides when to re-run a workflow                       │
└────────────────────┬─────────────────────────────────────────┘
                     │  intent, prompt, schema
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 2: Oracle / Apollo                                    │
│    Oracle: auth, routing, LLM dispatch, tool aggregation     │
│    Apollo: observe, learn, advise, curate                    │
└────────────────────┬─────────────────────────────────────────┘
                     │  tool calls, sub-LLM calls
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 3: Backend agents + libraries                         │
│    - agents (parallax, cortex, ...) — LLM-driven             │
│    - libraries (UDS, athena, ...) — operational, no LLM      │
│    - execute domain logic, return outputs                    │
│    - oracle observes the MCP round-trip and emits on their   │
│      behalf (L3 never addresses Apollo directly)             │
│    - agents consume injected guidance; libraries do not      │
└──────────────────────────────────────────────────────────────┘

Apollo observes the full lineage of each request: intent (layer 1) → routing and LLM reasoning (layer 2) → execution and outcome (layer 3) → final response back to layer 1.

Communication with L1 and L3 is unidirectional and oracle-mediated. Oracle is Apollo's sole emitter: it extracts L1 signals from /chat request bodies and observes L3 outputs across the MCP round-trip, calling oracle.oracle.observer.ingest in-process on both layers' behalf — L1 and L3 never address Apollo directly (§Invariants 14, §Ingest Semantics). Guidance flows back the same way, attached to envelopes already travelling: onto /chat responses for L1, onto outbound MCP dispatches for agent-kind L3 services, all on the envelope's ambient auth with no service token or long-lived connection (§Injection Channel). Oracle is also a guidance subscriber for its own chat LLM (L2): the tool-executor at oracle/server/llm/tool_executor.py consults a process-local ApolloGuidanceCache each turn, no transport involved (§L2 path). Admins are the only exception — they use GET /api/v1/apollo/guidance for inspection and may POST to /api/v1/apollo/observations for replay/seed.

Apollo does not observe L1's or L3's internal LLM turns. llm_turn events are emitted by oracle (L2) only. Apollo learns about L1, L2, and L3 LLMs indirectly — from L1/L3 outputs (intent_schema / user_prompt for L1; tool_output / tool_error for L3), from oracle's own llm_turn and final_response (L2), and from outcome correlation — and improves them prospectively by injecting updated guidance into their prompt context.

Trace Propagation

Apollo relies on a single trace_id shared across every observation emitted for one end-to-end request. Trace propagation follows the W3C Trace Context standard (traceparent header) end-to-end across L1, L2, and L3.

This is the concrete realization of the OpenTelemetry aspiration noted in SPEC-PLATFORM-03 (Oracle) and introduces no conflict with axonis-core: neither axonis-core nor SPEC-PLATFORM-01/02/03 currently defines a trace, request-id, or correlation-id header. The only header propagation today is Authorization via axonis_core.gateway.client.extract_http_headers() (SPEC-01). Apollo's adoption is additive.

Header

traceparent: 00-<trace-id 32 hex>-<parent-span-id 16 hex>-<flags 2 hex>

Format: W3C Trace Context Level 1. Apollo uses only the trace-id segment for lineage stitching; parent-span-id and flags are preserved for standards compliance and future OpenTelemetry interop but are not interpreted by Apollo's lineage layer.

Who mints

  • L1 mints the root traceparent on every new request and sets it on the outbound HTTP call to oracle /chat (and equivalent endpoints). L1 does not call Apollo directly (§Invariants 14); oracle re-emits L1-origin observations in-process, reusing the same trace-id.
  • If oracle receives a request without a traceparent header (e.g., a pre-W3C client), oracle mints one, logs a missing_traceparent telemetry event, and surfaces the minted trace-id in the response so callers can correlate if they choose.

How it travels

Hop Carrier
L1 → L2 (HTTP to oracle) traceparent request header
L2 → L3 (MCP tool dispatch) traceparent HTTP header on the POST to the service's MCP endpoint (same transport as the existing Authorization forward)
L2 → L3 (HTTP fallback, non-MCP) traceparent request header
L3 → L2 (MCP response → oracle) traceparent is preserved by oracle's MCP client across the round-trip; oracle stamps the same trace_id on the tool_output / tool_error envelope it emits in-process
Admin seed → Apollo (POST /observations) traceparent request header and trace_id field in the envelope
Out-of-process emitter → Apollo (secondary) traceparent request header and trace_id field in the envelope (envelope is authoritative)

Oracle is the only L2 hop and is responsible for forwarding the inbound traceparent unchanged on every downstream call that belongs to the same request. Oracle never re-mints mid-request.

axonis-core integration

Trace header propagation ships as an additive change to axonis-core — it lives with the existing cross-service header plumbing, not in oracle-only code:

  • axonis_core.gateway.client.extract_http_headers() — extended to forward traceparent alongside Authorization. This is the single source of truth for cross-service header propagation and is used by both MCPClient and RestClient.
  • axonis_core.gateway.mcp_client.MCPClient — reads traceparent from the inbound request context and sets it as an HTTP header on outbound MCP POSTs, alongside the existing Authorization forward.
  • axonis_core.gateway.rest_client.RestClient — reads traceparent from the inbound context and sets it as an HTTP header on outbound REST calls.
  • ApolloClient (SPEC-PLATFORM-14 §Ingest Semantics, in axonis-core) — used by admin replay and any future out-of-process emitter; reads the ambient traceparent from request context and sets it as an HTTP header on every POST /api/v1/apollo/observations call and into the envelope's trace_id field (the envelope wins on conflict — see §Envelope mapping). Phase-1 emitters do not use this client; oracle emits in-process and carries trace_id on the envelope it builds directly.

No new dependency is added to axonis-core — parsing the 4-segment traceparent string is a handful of lines; no OpenTelemetry SDK is required. A future OpenTelemetry integration can consume the same header without change.

Envelope mapping

Apollo's observation envelope fields map to W3C Trace Context as follows:

Envelope field W3C source Purpose
trace_id traceparent.trace-id (32-hex) Shared by all events for one end-to-end request
parent_trace_id not derived from traceparent Set by emitter only when this trace is a sub-request spawned from a separate enclosing trace (e.g., a scheduled background workflow). Null otherwise.

parent_trace_id is not the same as W3C parent-span-id. Apollo does not track span hierarchy within a single trace — its per-event observation cadence (§Observation Model → Observation cadence) makes span-level granularity unnecessary. parent_trace_id is used only for cross-trace fork linkage.

Configuration

  • APOLLO_TRACE_HEADER — header name. Default traceparent (W3C). Configurable only to ease staged rollout against pre-W3C emitters; always traceparent in production.
  • APOLLO_REQUIRE_TRACEPARENT — when true, oracle rejects inbound requests without a valid traceparent. Default false through Phases 1–2 (oracle mints on absence). Flip to true in Phase 3 alongside APOLLO_REQUIRE_INTENT_SCHEMA once emitter coverage is proven.

Failure posture

  • Missing header (best-effort): oracle mints, logs missing_traceparent, serves the request. Lineage still stitches because the minted trace-id flows downstream and is used by oracle's own observations.
  • Missing header (required mode): oracle rejects with 400; emitter must include traceparent.
  • Malformed header: oracle rejects with 400 in required mode; logs malformed_traceparent and mints a replacement in best-effort mode.
  • Envelope trace_id differs from header: the envelope value wins — it is the emitter's authoritative signal. Oracle logs the discrepancy for diagnostics.

Package Structure

Apollo lives inside oracle as a set of subsystem packages (after the apollo/oracle/ flatten), mounted into oracle's Starlette app at /api/v1/apollo/*. It is not a separate service and has no __main__.py of its own — the oracle invariant ("oracle is the only externally exposed service", SPEC-03 §Invariants 1) is preserved. Subsystems, all under oracle/ in the oracle repo:

  • oracle/observer/ — observation intake: ingest.py (normalize → memory write → evaluator fan-out), events.py (Pydantic event models).
  • oracle/memory/store.py: axonis-core UDS stores for every apollo_* index.
  • oracle/learner/synthesis.py (event-driven LLM synthesis), graphs.py (Decision Graphs), extractors.py, snapshots.py, drift.py (graph-anchor check), prompts.py, coalescer.py, similarity.py.
  • oracle/guidance/api.py (admin inspection endpoints), attacher.py (in-process attach to /chat + MCP dispatch), selectors.py (intent → artifact matching).
  • oracle/curator/actions.py, policy.py, audit.py, auto.py (autonomous Curator).
  • oracle/evaluator/scoring.py, signals.py, cascade.py, attribution.py, persist.py, recommendations.py.
  • oracle/lineage/ — cross-trace attribution persistence (apollo_lineage_events).
  • oracle/chat/ — admin-only conversational interface + its tools.
  • oracle/artifacts.py, oracle/schema.py, oracle/llm.py — typed artifact schemas, Schema/INDICES, Apollo's own LLM client.

Routes mount from server/__main__.py.

Observation Model

Event types

Apollo recognizes the following event types, emitted by oracle and backend services:

Event type Emitter Purpose
intent_schema Oracle (from L1 /chat request body) Front-end's generator schema for this request
user_prompt Oracle (from L1 /chat request body) Concrete prompt produced from the intent schema
llm_turn Oracle (layer 2) One LLM request/response cycle inside oracle
tool_output Oracle (from L3 MCP response) Successful tool execution: inputs, outputs, latency
tool_error Oracle (from L3 MCP response) Tool failure: inputs, error message, stack trace, latency
final_response Oracle What was returned to layer 1 at the end of a conversation turn
user_feedback Oracle (from L1 feedback submission) Thumbs up/down, correction, explicit follow-up signal

Emission paths are covered in detail in §Ingest Semantics. In summary: every Phase-1 event (L1-origin, L2-origin, and L3-origin) is emitted by oracle in-process — neither L1 nor L3 addresses Apollo directly (§Invariants 14). POST /api/v1/apollo/observations remains mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.

Observation payload

All observations share a common envelope — ObservationEnvelope (oracle/observer/events.py), the only shape Apollo accepts on either ingest path. Top-level fields: event_type (the EventType enum), lineage (trace_id, parent_trace_id, conversation_id), service, timestamp, the two attribution axes caller_identity and emitted_by, and an event-specific payload (one of the typed _Payload subclasses in the same module).

caller_identity vs emitted_by. Apollo records two attribution axes per observation:

  • caller_identity — application-asserted. Who the work is attributed to. Set by the emitter (often a service token stamping observations on behalf of an end user — e.g., cortex emits caller_identity.username="alice" because alice's /chat request fanned out to cortex). The handler stamps this from the Bearer token only when the envelope didn't carry one.
  • emitted_by — server-stamped, unforgeable. Who actually pushed the bytes. Always overwritten by Apollo's ingest handler (HTTP path) or in-process emit helper (oracle hosts Apollo). Carries the validated token subject, roles, and a context ("http" or "in_process"). Emitters cannot forge it; the handler ignores any inbound value and stamps from request.state.token_payload.

Audit query: rows where caller_identity.username != emitted_by.token_subject and emitted_by.token_subject is not a known service principal → flag for review. The two-axis model preserves the legitimate cross-attribution pattern (services emitting per-user observations) while making forging detectable.

Observation cadence (locked)

Apollo records one observation per: - Turn boundary (each LLM request/response cycle) - Tool invocation (tool_output or tool_error) - Error - Final response returned to layer 1

Apollo does not record per-token events. Token-level observation is too noisy for the learner and would cause drift in learned artifacts. This is a drift-prevention decision.

Lineage

Every observation carries a trace_id derived from the W3C traceparent header propagated across L1 → L2 → L3 (§Trace Propagation). Related observations (all events from one end-to-end request) share the same trace_id. Cross-trace sub-requests (e.g., scheduled background workflows spawned from a chat turn) use parent_trace_id for hierarchy.

Memory Model

Apollo's memory is two-tiered:

  1. Raw observations — the events listed above, stored in the Elastic apollo_observations index. High volume, time-boxed retention.
  2. Learned artifacts — structured, versioned objects produced by the Learner. Stored in the Elastic apollo_artifacts index. Low volume, long-lived.

Both indices use the Memory(UDS) class from axonis_core.userspace.intelligence as their UDS primitive (SPEC-01 §Memory Pattern), specialized via subclassing. Apollo does not re-implement the storage surface.

Artifact types

Artifact Description
DecisionGraph A specialized graph of decision points and transitions (see §Decision Graphs)
DecisionTrajectory Smoothed trajectory of a graph's evolution over time
DriftEvent Flagged structural shift in a decision graph requiring review or explanation
IntentPattern Recurring front-end intent → successful tool/service routing and output shape
IntentSchema Known layer 1 generator schemas Apollo has learned to recognize
SchemaDrift Layer 1 started emitting a new or changed schema — flagged for admin review
PromptShape Recurring prompt structure correlated with good/bad outcomes
ToolPairingHint "Tool X is usually followed by tool Y in successful runs"
FailurePattern Known failure mode with diagnostic signature and recommended remediation
ServiceConnectionHint "For intent of class Q, service S gives better results than service S'"
SpecFragment Short, targeted spec snippet relevant to a class of intent
PromptShim System-prompt addition that improves outcomes for a class of intent
CapabilityMap Distilled view of which services can satisfy which intents

Each artifact is a Pydantic model in apollo/artifacts.py backed by a UDS class. Artifacts are versioned — see §Curator.

Index mappings and templates

Every Apollo index is a flat Elasticsearch index (not a data stream, no ILM policy). Mappings are shipped as JSON templates under oracle/oracle/templates/, following the same convention as athena/core/templates/*_mapping.json:

  • apollo_observations_mapping.json
  • apollo_artifacts_mapping.json
  • apollo_artifact_history_mapping.json
  • apollo_graph_nodes_mapping.json
  • apollo_graph_edges_mapping.json
  • apollo_graph_snapshots_mapping.json
  • apollo_audit_mapping.json

Every mapping includes the standard UDS block (uds.timestamp, uds.username, uds.visibility), create_ts, update_ts, schema_version, and — for time-limited indices — an expires_ts date field (same pattern as athena/core/templates/memory_mapping.json). Every index follows the Memory(UDS) / Elastic base-class pattern from SPEC-01 so that CRUD goes through axonis_core.elastic.Elastic.

Retention

Retention is application-managed, not Elastic-ILM-managed. This matches the codebase convention: axonis-core and athena do not configure ILM policies, rollovers, or data streams. Each Apollo document that has a bounded lifetime carries an expires_ts field; a periodic maintenance task (see below) runs Elastic.delete_by_query filtering on expires_ts < now() to reclaim space.

Class Index Expiry mechanism Retention
Raw observations apollo_observations expires_ts = create_ts + 30d set on write 30 days
Graph snapshots (hot) apollo_graph_snapshots expires_ts set by coarsening task (see below) Hourly granularity for 7 days
Graph snapshots (warm) apollo_graph_snapshots Daily snapshots retained after coarsening Daily granularity for 30 days
Graph snapshots (cold) apollo_graph_snapshots Weekly snapshots retained after coarsening Weekly granularity for 90 days total
Learned artifacts apollo_artifacts No expires_ts — lifecycle driven by Curator Indefinite; forgotten by admin or Evaluator-demoted N cycles
Artifact history apollo_artifact_history No expires_ts Indefinite (rollback substrate)
Audit log apollo_audit expires_ts = create_ts + 90d or null for indefinite ≥ 90 days (configurable)

Maintenance task. A periodic background job (default hourly, configurable via APOLLO_MAINTENANCE_INTERVAL) performs: 1. delete_by_query on any index where expires_ts < now() 2. Coarsening on apollo_graph_snapshots: hourly rows older than APOLLO_SNAPSHOT_HOURLY_TO_DAILY_AGE_DAYS (default 7) are grouped by (graph_id, calendar date); the most recent row in each group is re-tagged tier="daily" and the rest deleted. Same shape at the daily→weekly boundary: dailies older than APOLLO_SNAPSHOT_DAILY_TO_WEEKLY_AGE_DAYS (default 30) collapse to one weekly row per (graph_id, ISO week). Both windows are env-overridable; see apollo/settings.py for the documented operator profiles, validation rules, and storage trade-offs. 3. Optional Learner-driven compaction of observations near TTL into apollo_artifacts summaries (event-driven: compaction runs on admin-initiated synthesis or guidance miss, not in this maintenance pass).

The maintenance task uses axonis_core.elastic.Elastic.delete_by_query — no Apollo-specific Elastic client.

Learner

Apollo's Learner is LLM-driven, graph-anchored. Apollo's LLM (see §Apollo's LLM) is the primary engine of synthesis: it processes observations as they arrive (event-driven — see §LLM synthesis below), creates and refines artifacts, classifies intents, diagnoses outcomes, and drives admin chat. The decision graphs are supplemental — they provide deterministic grounding that keeps the LLM anchored and prevents it from drifting.

The relationship is: the LLM reasons flexibly; the graphs remember rigidly. Every LLM call reads the relevant graph state as grounding context. Every LLM output is checked against the graph's trajectory. The LLM cannot propose a pattern that contradicts what the graphs have deterministically recorded without being flagged as drift.

Decision Graphs

Apollo maintains a series of specialized graphs rather than one monolithic graph. Each graph captures a different decision surface:

Graph Nodes Edges
intent_tool_graph Intent classes, tool identifiers "Intent → tool chosen" with outcome weight
prompt_shape_graph Prompt structure clusters "Shape A evolved into shape B in later iteration"
service_routing_graph Intent classes, backend services "Intent → service picked" with outcome weight
outcome_graph Decision points, outcome classes "Decision → outcome produced" with frequency
iteration_graph States within a layer-1 re-run chain "Iteration N → Iteration N+1 decision delta"

Cross-graph links exist where decisions in one graph point to nodes in another (e.g., a tool-selection node in intent_tool_graph links to the outcome node in outcome_graph).

Node and edge model

Each node carries: - id, graph_id, kind, label - Occurrence count, first-seen / last-seen timestamps - Outcome distribution (aggregated from incoming observations) - Tags for retrieval

Service-namespaced labels

Every label that is naturally service-scoped is prefixed with the emitting service: <envelope.service>/<label>. Concretely, the extractors namespace labels for the following node kinds:

Graph Kind Example label
intent_tool_graph intent, tool cortex/screening, cortex/summarize
prompt_shape_graph prompt_shape oracle/shape:20:a3f1b2c0
service_routing_graph intent parallax/screening
outcome_graph decision_point cortex/tool:summarize, oracle/conversation:conv_42
iteration_graph iteration_state oracle/iter:trc_1234

Two node kinds are intentionally not prefixed:

  • service nodes in service_routing_graph carry the service name itself as their identity (e.g., bare cortex). Prefixing would yield the meaningless label cortex/cortex.
  • outcome nodes in outcome_graph carry universal categorical labels (success, error, feedback_up, feedback_down, feedback_abandoned). The per-service split is carried by the decision_point side of the edge, not by fragmenting the outcome taxonomy.

This rule means two backend services that register a tool with the same name (e.g. cortex/summarize and parallax/summarize) form distinct nodes and accumulate counts, EWMA weights, and outcome distributions independently. Downstream synthesis (M8), drift detection (M12), and evaluator scoring (M10) therefore operate on per-service signal rather than a cross-service average.

Each edge carries: - source_id, target_id - Weight (an outcome-correlation-adjusted transition probability) - Count, first-seen / last-seen - Recent-window weight (exponentially-weighted moving average over a short horizon) - Long-window weight (EWMA over a long horizon)

The divergence between recent-window and long-window weights is the primary drift signal.

Graph updates (per observation, deterministic)

The Learner's extractors run deterministically on every ingested observation:

  1. Extract decision points (e.g., "intent class", "tool called", "service routed", "outcome class") using rules and lightweight matchers.
  2. Upsert nodes: create new if absent, increment count and update last-seen if present.
  3. Upsert edges: create new or reinforce. Update short-window and long-window weights.
  4. Attach the observation's trace_id to the affected nodes/edges for lineage queries.

No LLM call. No new free-form artifacts. Graph mutations only. This path is the grounding layer — it records what has actually happened in the system, with no interpretation.

Snapshots and trajectory

  • Snapshots. Each graph is snapshotted on a cadence (default: hourly; configurable via APOLLO_GRAPH_SNAPSHOT_INTERVAL) into the Elastic apollo_graph_snapshots index. Snapshots are the substrate for past-vs-current comparison.
  • Trajectory. A projection of near-future graph state from current EWMA velocities. Used by Guidance to pre-warm likely-next decisions and by drift detection to establish an expected trajectory.

LLM synthesis (event-driven, primary driver)

Apollo's LLM runs the primary synthesis engine and is event-driven, not scheduled. It is invoked in response to specific observation events — not on a timer, not on a batch threshold. The cadence of synthesis matches the cadence of actual system activity.

Synthesis triggers.

Trigger Inbound event
Layer 1 sends a request intent_schema or user_prompt observation ingested
Layer 3 returns an output tool_output, tool_error, or final_response ingested
Admin chat turn POST /api/v1/apollo/chat request
Admin-initiated synthesis POST /api/v1/apollo/learn request

Other observation types (llm_turn from oracle itself) feed the graphs but do not trigger synthesis on their own — they are intermediate steps between a Layer 1 request and a Layer 3 output. Novel-intent synthesis occurs naturally on the Layer 1 / Layer 3 triggers above; no GET /guidance request from L1 or L3 ever drives synthesis (§Invariants 14).

Inputs on each synthesis call. - The triggering observation (or chat turn) - The relevant subgraph state from each decision graph (grounding context) - Active artifacts that match the observation's intent/tool/service fingerprints - Recent evaluator scores for matched artifacts - Prior synthesis output for the same trace_id, if any (for continuity within a request lineage)

Outputs. - Proposed new artifacts (IntentPattern, FailurePattern, etc.) - Proposed edits to existing artifacts - Proposed promotions/demotions - Drift flags when the LLM itself detects divergence - Compaction proposals for old observations near TTL - For admin chat and admin-initiated triggers: a direct response returned to the caller

All outputs are structured Pydantic models. The Curator commits them only after the graph-anchor drift check (below) clears.

Concurrency. A burst of Layer 3 tool outputs (e.g., a fusion run with many tool calls) can trigger many near-simultaneous synthesis calls. Apollo bounds concurrent synthesis via APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) with a queue of pending triggers. Duplicate triggers within the same trace_id are coalesced: only the latest observation in a lineage is processed.

Graph-anchor drift check

The graphs are the anti-drift mechanism. Every LLM synthesis output is validated against the graphs before the Curator commits it:

  • Proposed pattern vs. recorded edges. If the LLM proposes "tool X is typically followed by tool Y" but the intent_tool_graph edge X→Y has low weight or is absent, the proposal is flagged.
  • Proposed intent classification vs. node clusters. If the LLM introduces an intent class that does not correspond to any node cluster in the graphs, flagged.
  • Weight swings. If the LLM's proposal would effectively invert a strongly-weighted edge, flagged — even if the LLM's reasoning is plausible, this is exactly the shape of drift.
  • Trajectory coherence. If the LLM's proposed trajectory diverges from the graph's EWMA projection, flagged.

Flagged outputs produce a DriftEvent artifact. The Curator does not commit a flagged proposal autonomously — admin review is required via chat or the audit surface. This is how the graphs protect the LLM from itself.

Drift vs. evolution

The graph-anchor check distinguishes:

  • Evolution — LLM synthesis outputs consistent with graph trajectory; graph weights shift smoothly as observations accumulate. Proposals are committed autonomously by the Curator.
  • Drift — LLM synthesis outputs diverge from graph state; sudden edge-weight swings; emergent nodes appearing faster than configured rate caps. Proposals are held for admin review.

Thresholds are per-graph and configurable (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance on LLM outputs).

Storage

  • Graph nodes and edges live in the Elastic apollo_graph_nodes and apollo_graph_edges indices (UDS-backed, per SPEC-01 invariant 2).
  • A working in-memory mirror of the active graphs is maintained for hot-path reads (guidance, drift detection). The in-memory mirror is derived state; it is always rebuildable from Elastic.
  • Snapshots live in apollo_graph_snapshots. Snapshots are immutable after write.

Guidance API (admin inspection only)

Apollo delivers guidance to L1 and L3 LLMs exclusively via the response-attached Injection Channel (§Injection Channel); those layers never pull it — there are no GET calls from L1/L3 in the runtime path. The GET /guidance* endpoints below are retained only as admin inspection tools — admins and admin-chat tooling preview what Apollo would currently inject for a given intent, layer, or subscriber. They require role admin via oracle's guardrails (SPEC-03 §Guardrails). L3 operational libraries (no LLM) receive no guidance — they emit observations and are otherwise opaque to Apollo.

Endpoints

GET  /api/v1/apollo/guidance?intent=<query>&layer=1|3
GET  /api/v1/apollo/guidance/schemas
GET  /api/v1/apollo/guidance/tools
GET  /api/v1/apollo/guidance/specs
GET  /api/v1/apollo/guidance/connections

The top-level /guidance endpoint accepts an intent description (free text or structured) and the consuming layer, and returns a ranked set of applicable artifacts — previewing what Apollo would currently inject. The sub-paths return filtered artifact views by type.

All endpoints require the admin role. L1 and L3 never call them (§Invariants 14).

Example response

{
  "intent_match": {"pattern_id": "ipat_abc", "score": 0.88},
  "schemas": [...],
  "tools": [
    {"name": "fusion_run_start", "description_override": "...", "routing_hint": "parallax"}
  ],
  "specs": [
    {"id": "spec_frag_123", "content": "For federate alignment, ensure lens binding..."}
  ],
  "connections": [
    {"from": "layer1.screening_intent", "to": "parallax.fusion", "confidence": 0.91}
  ]
}

Injection Channel

Apollo delivers guidance to L1 and L3 LLMs by attaching it to the existing request/response flow — symmetric piggybacking in both directions. There is no separate push transport, no long-lived connection, no service token, no SSE client in production. Guidance is computed at request time (in-process inside oracle, since Apollo lives there) and embedded in the envelope that was already travelling.

  • L1 path: oracle attaches current applicable guidance to every /chat response body.
  • L3 path: oracle attaches current applicable guidance to every outbound MCP tool dispatch.

Both paths are fresh-per-call by construction — there is no cache to go stale, no reconnect to replay, no disconnected subscriber to reconcile. Apollo lives inside oracle, so fetching guidance for an outbound envelope is an in-process Python call, not a network hop.

Guidance communication is unidirectional: Apollo → L1, Apollo → L3. Subscribers never POST guidance back (observation ingest is a separate path — §Ingest Semantics). Captured as §Invariants 14.

Why response-attached instead of a push channel

L1 is only doing LLM work when composing a response to the user's latest message — the act of calling /chat is what triggers that work. Any guidance change Apollo makes while L1 is idle has nothing to apply to until the next /chat, at which point the response can carry the freshest state. A separate push channel for idle L1 therefore provides no observable benefit and introduces a long-lived auth session to maintain.

L3 agents only exist inside a user-request context (oracle dispatches to them; they validate the forwarded user token). There is no service-token mechanism in axonis-core today (see §Authentication & Authorization). A long-lived L3 connection would require inventing one. Attaching guidance to the MCP dispatch uses the existing user-token-forwarding pattern and delivers guidance exactly when the agent needs it.

L1 path: attached to /chat responses

POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop (oracle/server/llm/tool_executor.py, 5 providers: anthropic / openai / groq / ollama / trinity). It is distinct from Apollo's admin chat at POST /api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for talking to Apollo's synthesis brain.

When oracle responds to a POST /chat, it calls Apollo's in-process oracle.guidance.for_l1(user=..., intent_context=...) before serializing, and embeds the result on the response envelope under apollo_guidance. Beacon-style L1 clients consume that field via their local ApolloGuidanceCache.update(...).

Model extension. Oracle's existing ChatResponse Pydantic model (oracle/server/api/routes.py) must be extended with an optional field:

class ChatResponse(BaseModel):
    response: str
    conversation_id: str
    tool_calls: list = Field(default_factory=list)
    model_used: str = ""
    tokens: dict = Field(default_factory=dict)
    apollo_guidance: dict | None = Field(default=None)   # added by SPEC-14

The field defaults to None, so pre-Apollo clients and responses where guidance is omitted (attach-timeout, Apollo unavailable, empty applicable set) serialize identically to today. Clients that don't know about the field simply ignore it.

Envelope shape when guidance is present:

{
  "response": "...assistant reply...",
  "conversation_id": "...",
  "tool_calls": [...],
  "model_used": "...",
  "tokens": {...},
  "apollo_guidance": {
    "as_of": "2026-04-17T10:30:00Z",
    "artifacts": [
      {
        "id": "pshim_xyz",
        "type": "PromptShim",
        "version": 7,
        "content": { ... },
        "applicability": { "intent_class": "...", "tags": [...] },
        "rationale": "Human-readable explanation of why this artifact is active now."
      }
    ],
    "rationale_summary": "3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
  }
}

L1 receives the response, hands apollo_guidance.artifacts to its local ApolloGuidanceCache, and renders the assistant message. The payload is the complete applicable set for this user's L1 scope — not a diff — so cache replacement is strictly idempotent.

On the next user turn, L1 uses the freshly-populated cache to compose its prompt. Guidance staleness is bounded by a single turn.

L2 path: in-process cache for oracle's own chat LLM

Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Oracle is therefore also a guidance subscriber for its own LLM — distinct from L1 (beacon's LLM) and L3 (cortex/parallax's LLMs).

Because oracle hosts Apollo, no transport is needed. Oracle owns a process-local ApolloGuidanceCache populated directly from oracle.guidance.for_l2(...) (analogous to for_l1 and for_l3_agent) before each LLM turn. The tool-executor consults the cache via the canonical accessors (get_system_prompt_additions, get_spec_fragments, get_active_failure_patterns, get_tool_pairing_hints, get_tool_description_overrides, get_service_connection_hints) on every turn and folds the results into its system prompt and tool-catalog rendering, exactly as L1 and L3 subscribers do.

The L2 path is symmetric with L1/L3 in artifact applicability filtering (scope=l2 on the attacher), in the timeout budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS), and in the failure posture (cache miss / timeout → tool-executor proceeds with no guidance, request still succeeds). It differs in transport only: no JSON serialisation, no envelope traversal — a direct in-process call.

L3 path: attached to MCP tool dispatches

When oracle dispatches a tool call to an L3 agent (component_kind == "agent"), oracle attaches Apollo's currently-applicable guidance inside the tool's arguments dict under the apollo_guidance key — mirroring the existing pattern oracle uses to inject llm_spec into arguments (oracle/server/mcp/server.py). This keeps the JSON-RPC envelope shape unchanged (params stays {name, arguments}) and requires no MCP handler changes on agent-side beyond the agent extracting and applying the new argument:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "fusion_run_start",
    "arguments": {
      "...tool-specific args...": "...",
      "apollo_guidance": {
        "as_of": "2026-04-17T10:30:00Z",
        "artifacts": [ ... ],
        "rationale_summary": "..."
      }
    }
  }
}

L3 agent-side MCP handlers extract apollo_guidance from arguments (the same way they currently extract llm_spec), hand it to their local ApolloGuidanceCache for the duration of this request's LLM turns, and strip it before passing the remaining arguments to the tool's business logic. Because L3 only acts inside a user-request context, cache lifetime naturally scopes to the request — no background state, no long-lived connection, no service-token novelty.

L3 libraries (component_kind == "library") do not receive apollo_guidance in their dispatches — oracle filters them out before serialization. Libraries have no LLM to improve (§Invariants 15).

Payload shape

apollo_guidance carries:

  • as_of — timestamp of the artifact snapshot. Used for traceability and admin debugging.
  • artifacts — the currently-applicable artifact set for the subscriber's scope. Each artifact has id, type, version, content, applicability, and rationale (see §Rationale and evidence).
  • rationale_summary — structured one-liner naming the attached artifact IDs per type, plus a +N capped (...) tail for artifacts the per-type cap held back. See §Prioritization Layers → Layer 5 for the exact format and the parallel aggregate_artifact_stats query for per-artifact stats.

There is no injection_id, no reason/trigger enum, no subscriber_scope, no evidence_ref on the per-call payload. That metadata lives in the audit log (§Audit log) — attaching it to every response/dispatch would balloon payload size with data that matters to admins, not to LLMs.

Freshness and ordering

Guidance is always at most one turn stale from each subscriber's perspective:

  • L1's next /chat call sees the freshest guidance. Between turns, L1's cache reflects the guidance as-of the most recent response.
  • L3's MCP dispatch carries guidance computed at the instant oracle is about to call. By construction the agent sees guidance current at dispatch time.

Because the cache is overwritten on every inbound response/dispatch, there is no "subscriber drift" problem to solve — the cache cannot diverge from Apollo.

Triggers (synthesis unchanged)

Apollo's Curator still commits artifact mutations event-driven (§Learner → LLM synthesis). The commits no longer trigger separate push events — they simply become the state that the next attached apollo_guidance payload reflects. Pause/resume of the Curator is therefore also a passive effect: paused Curator → artifact set stops changing → subscribers keep receiving the same state on subsequent calls.

Failure posture

  • Apollo slow: oracle's guidance-fetch call has a strict in-process budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS, default 10 ms). On overshoot, oracle serializes the response/dispatch without apollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. No user-visible failure.
  • Apollo has no applicable guidance: apollo_guidance is omitted (or null). Subscribers proceed without guidance. Normal state during Phase 1.
  • Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation, demotes; the next attached payload reflects the demotion. Admin can force rollback at any time.
  • No network partition risk: Apollo is in-process with oracle. There is no network path between them that can fail.
  • Curator paused: attached payloads continue to reflect the state as-of the pause. Subscribers see frozen guidance until resume. Because every response/dispatch still carries the current set, subscribers never lose their guidance due to the pause — it just stops changing.

Rationale and evidence

Each artifact in the attached payload carries a rationale string (LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions). This is the same rationale written into apollo_audit (§Audit log). Subscribers may log it when applying the artifact to a prompt; admins query it via audit log or admin chat (§Admin Chat).

The fuller evidence_ref (pointers to observations, graph snapshot id, score decomposition, related drift events) is not carried in the per-call payload — it lives in apollo_audit. Admins retrieve it via explain_decision / discuss_decision in admin chat, which resolves the audit record.

Audit

Every Curator action writes an apollo_audit record with action, actor, trigger, rationale, and evidence_ref (§Audit log). Individual deliveries — attached payloads on responses and dispatches — are not audited. Delivery would produce one record per user turn per layer, far too noisy to be useful. The audit captures decisions; deliveries are implementation detail.

Subscriber SDK: ApolloGuidanceCache (pure local cache)

ApolloGuidanceCache in axonis-core is a pure in-memory cache with no transport. It has two surfaces:

Update (called by the subscriber's request handler):

  • cache.update(apollo_guidance_block) — replaces the cache's artifact set with the payload. Idempotent; the payload is the complete applicable set, not a diff.

Canonical accessors (consumed by the subscriber's LLM-turn codepath):

Method Returns Used at
get_system_prompt_additions(intent_context) Ordered list of PromptShim bodies System-prompt construction
get_spec_fragments(intent_context) List of SpecFragment RAG-like context insertion
get_tool_description_overrides(tool_name) Override dict or None Tool-catalog rendering
get_tool_pairing_hints(current_tool) List of ToolPairingHint After-tool-call reasoning
get_active_failure_patterns(intent_context) List of FailurePattern with diagnostic hints Pre-call guard; post-call error interpretation
get_service_connection_hints(intent_context) List of ServiceConnectionHint Service routing

Applicability filtering happens inside the cache: each artifact's applicability block is matched against the caller's intent_context. When multiple artifacts of the same type match, the SDK returns them ordered by (weight desc, recency desc); merge policy past ordering is the agent's choice.

No HTTP client, no long-lived connection, no authentication inside the SDK — the cache is a data structure inside the subscriber's process. SPEC-01 invariant 1 (axonis-core has no ML dependencies) is preserved; ApolloGuidanceCache is pure Python data structures.

Empty-cache fallback: if no apollo_guidance has yet been delivered to this subscriber (first call, Apollo off, APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS overshoot), all accessors return empty lists / None. The subscriber proceeds without guidance. This is the safe default pre-Apollo behavior.

Admin inspection

Admins can preview what Apollo would attach on the next request:

Method Path Purpose
GET /api/v1/apollo/guidance?scope=l1 Preview current L1-scoped artifact set
GET /api/v1/apollo/guidance?scope=l3:<service_name> Preview current L3-scoped artifact set for a given agent
GET /api/v1/apollo/guidance/stream?scope=<scope> Admin-only SSE feed of Curator commits in real time (debugging)

The SSE feed is a debugging aid only — production delivery never uses it. All admin inspection endpoints require the admin role.

Prioritization Layers

The attacher must not only find applicable artifacts but choose which subset reaches the receiver's LLM: a naive "match everything, send everything" bloats the receiver's prompt budget as the artifact set grows and makes operator-promoted artifacts indistinguishable from low-value ones. Prioritization is implemented as seven cooperating layers that make selection observable, quality-aware, and bounded. They are ordered by data dependency — earlier layers don't depend on later ones, and each stays useful even when the layers above it are disabled.

Layer 1 — Capped-artifact observability

When the per-type attach cap drops an artifact (see Layer 2 for the cap mechanism itself), each held-back artifact gets a row in apollo_lineage_events with kind: "capped", the artifact's artifact_type, the call's scope and trace_id. Two query paths read these rows:

  • query_capped_for_artifact(artifact_id, *, service_name=None, limit=500) — list traces where this artifact was capped.
  • aggregate_artifact_stats(artifact_id, *, since=None, limit=1000){attached_count, capped_count, last_attached_at, last_capped_at}.

Both surfaces are exposed on the admin REST API as GET /lineage/capped and GET /artifacts/{artifact_id}/stats. The same lineage rows are also available to the evaluator for "matched-but-shadowed" diagnostics.

Invariant. query_traces_with_artifact and query_trace_attribution filter kind: "capped" out by default — the "applied" semantics of /lineage is unchanged.

Layer 2 — Selection sort key

oracle.guidance.attacher._sort_key orders matched artifacts before the cap fires. Each tier has a default that preserves the previous tier's behavior, so the chain stays well-defined even with sparse data:

Tier Source Default when missing
1 content.evaluator_score 1.0 (innocent until signaled)
2 content.confidence 0.0 (no opinion stated)
3 applicability specificity (count of populated narrowing fields) 0
4 content.weight 1.0
5 as_of ""

evaluator_score defaults to 1.0 to match ArtifactScore.score's baseline (a never-signaled artifact is treated as innocent). confidence defaults to 0.0 because synthesis confidence is an opt-in endorsement — absence means "no opinion." Specificity activates today and is the practical lever when the upper tiers tie; tiers 1 and 2 become load-bearing once their sources flow (see Layers 4-A and 4-B).

Per-type caps live in config:

APOLLO_ATTACH_CAP_PROMPT_SHIM=10
APOLLO_ATTACH_CAP_SPEC_FRAGMENT=5
APOLLO_ATTACH_CAP_TOOL_PAIRING_HINT=5
APOLLO_ATTACH_CAP_FAILURE_PATTERN=10
APOLLO_ATTACH_CAP_SERVICE_CONNECTION_HINT=5
APOLLO_ATTACH_CAP_INTENT_PATTERN=5

ApolloGuidanceCache._sorted on the receiver side uses an identical priority key so the order the sender selected is preserved through to the LLM.

Layer 3 — Signal preservation at promote

The promote action's content-extraction helper (_content_from_proposal) strips proposal metadata before storing on the artifact. The three ranking signals (evaluator_score, confidence, weight) must not be added to the metadata strip-list. The constants _METADATA_KEYS and _RANKING_SIGNALS in apollo/curator/actions.py make this contract explicit; TestRankingSignalContract enforces it.

Invariant. If a proposal carries evaluator_score, confidence, or weight at the top level, the promoted artifact's content must carry them too.

Layer 4-A — Evaluator score writeback

apollo/evaluator/persist.py:persist_score_to_artifact writes content.evaluator_score and content.score_decomposition to the artifact document after every signal application in the ingest worker. Uses a Painless script to preserve the type-specific content fields (text, signature, etc.).

Properties: - Fire-and-forget. Never blocks the ingest hot path. - Idempotent. retry_on_conflict=3 handles concurrent worker writes. - Kill-switch. APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false disables persistence without touching the in-memory engine (audit + cascade paths still work). - Graceful degradation. Failures are logged and counted (apollo_evaluator_score_persist_failed_total); the in-memory engine remains authoritative.

Layer 4-B — Synthesis confidence

Every synthesis prompt (build_failure_pattern_prompt, build_intent_pattern_prompt, build_prompt_shim_prompt, build_sweep_prompt) requires the LLM to emit a top-level confidence: 0.0..1.0. The _SHARED_RULES block explains the semantics — reserve high confidence for patterns the model would stake its reputation on, because Apollo uses it to rank artifacts at attach time.

apollo/learner/synthesis.py:_normalize_confidence is called from _record_proposal and: - Clamps values to [0.0, 1.0]. - Coerces missing or unparseable inputs to _NEUTRAL_CONFIDENCE = 0.5 so a malformed LLM response doesn't unfairly downrank an otherwise-valid proposal.

The normalized value rides on the proposal through promote (via the Layer 3 contract) onto artifact.content.confidence, where Layer 2's sort consumes it.

Layer 5 — Deepened rationale_summary + per-artifact aggregation

oracle.guidance.attacher._summarize emits a structured summary that names attached and capped artifact IDs per type:

"3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"

ID lists truncate to _SUMMARY_ID_PREVIEW = 5 with a +N tail. Empty input still produces "". Types are sorted alphabetically so summaries diff cleanly across calls.

aggregate_artifact_stats (Layer 1) is the symmetric on-demand summary keyed by artifact rather than by attach call.

Layer 6-A — Artifact embedding at promote

apollo/learner/similarity.py:compute_embedding reuses axonis.memory.embedder.embed (sentence-transformers, gated by the [memory] extra). The vector is stored on content.embedding_vector. Type-aware text extraction handles each artifact type's content shape (PromptShim text, FailurePattern signature+remediation, etc.).

Graceful degradation. When sentence-transformers is unavailable, compute_embedding returns None; the promote still succeeds with no embedding stored. Downstream similarity checks (6-B, 6-C) skip artifacts without embeddings.

Layer 6-B — Promote-time similarity advisory

After the embedding is computed, the promote handler scans active artifacts at the same (type, service_name, tool_name) scope and surfaces matches above the cosine threshold in ActionResult.similar_artifacts. Default threshold: APOLLO_SIMILARITY_THRESHOLD=0.9.

The advisory is informational only — promote still succeeds. Admin chooses whether to demote + supersede the prior(s) by re-promoting with supersede: true and the prior's IDs.

Layer 6-C — Curator-time similarity sweep

apollo/learner/coalescer.py:run_periodic is a fifth background loop alongside snapshot, curator-auto, maintenance, and synthesis-sweep. Each tick:

  1. Loads all status=active artifacts.
  2. Partitions by (type, service_name, tool_name).
  3. Within each partition, union-finds clusters where every pairwise cosine ≥ APOLLO_COALESCER_THRESHOLD (default 0.85, slightly looser than 6-B's promote-time threshold).
  4. For each cluster, calls Apollo's LLM via build_coalesce_prompt to write a coherent merger.
  5. Records the merger as a proposal on apollo_proposals with supersedes: [id1, id2, ...] so admin promote demotes the components atomically.

Bounded per sweep: APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN=5 (defensive LLM-cost cap). Off by default (APOLLO_COALESCER_ENABLED=false) — operators opt in once they're ready to budget the LLM calls and review the proposals.

promote() extends the supersede flag's semantics: when the proposal carries supersedes: [...], each listed artifact is demoted alongside the new promote, in the same atomic batch.

Metrics surface

Each layer adds telemetry so operators can see what's happening:

Metric Source layer
apollo_guidance_attach_null_total{scope, reason} observability over the attach path's null returns
apollo_guidance_attach_success_total{scope} counterpart counter for successful attaches
apollo_guidance_attach_payload_bytes{scope} (histogram) size growth — operators alert if it bloats
apollo_guidance_attach_artifact_count{scope} (histogram) distribution of artifacts per attach
apollo_guidance_attach_capped_total{scope, artifact_type} per-type drop counts (Layer 2 → 1)
apollo_evaluator_score_persisted_total / apollo_evaluator_score_persist_failed_total Layer 4-A health
apollo_coalescer_proposals_emitted_total / apollo_coalescer_merge_failed_total Layer 6-C health

A guidance_health block on GET /stats summarizes per-scope success/null breakdown for at-a-glance review.

Disabling layers

Every layer can be turned off independently:

APOLLO_GUIDANCE_ATTACH_ENABLED=false      # disables Layer 2 + everything above
APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false  # Layer 4-A
APOLLO_SIMILARITY_ENABLED=false           # Layer 6-A + 6-B
APOLLO_COALESCER_ENABLED=false            # Layer 6-C (default off)
APOLLO_LINEAGE_PERSIST_ENABLED=false      # Layer 1

When disabled, the layer degrades to no-op; the rest of the system keeps running with the next-best signal.

Curator

The Curator is the only component empowered to mutate Apollo's memory. All mutations are bounded and auditable.

Allowed autonomous actions

  • Promote an artifact (increase its weight in guidance results)
  • Demote an artifact (hide from guidance without deleting)
  • Forget an artifact (delete after it has been demoted for N evaluation cycles)
  • Edit artifact metadata (tags, applicability, version, human-readable notes)
  • Summarize / compact raw observations into a new artifact

Disallowed actions (hard invariants)

  • Change auth or guardrails configuration
  • Widen or narrow a caller's tool access
  • Read or mutate another user's conversation data
  • Mint tokens, escalate privileges, or bypass OAuth
  • Call backend services on behalf of any user
  • Modify or delete audit log records

Versioning

Apollo uses a two-tier versioning model. Artifacts are versioned per-mutation; graphs are captured via snapshots (see §Snapshots and trajectory and §Retention). Both are in place from Phase 1 — versioning is cheap to establish up front and impossible to reconstruct retroactively once Curator empowerment goes live in Phase 3.

Artifacts (IntentPattern, FailurePattern, PromptShim, SpecFragment, ToolPairingHint, ServiceConnectionHint, CapabilityMap, DecisionTrajectory, DriftEvent, IntentSchema, SchemaDrift, PromptShape). Every mutation — autonomous Curator action, admin edit, synthesis-proposed edit, rollback — produces a new version:

  • Current version lives in apollo_artifacts.
  • Every prior version is copied to apollo_artifact_history before the mutation.
  • Each artifact record carries version, prev_version_id, change_reason, actor.
  • apollo_artifact_history has no expires_ts — prior versions are retained indefinitely as the rollback substrate (§Retention).
  • Rollback: POST /api/v1/apollo/artifacts/{id}/rollback with target version or prev_version_id replaces the current record and writes a new version whose prev_version_id points at the post-rollback state (so rollback itself is a versioned event, recorded in audit).

Graphs (DecisionGraph). Per-observation node/edge mutations are too high-frequency to version individually. Graph rollback uses snapshots instead:

  • Hourly snapshots for 7 days, daily for 30 days, weekly for 90 days (per §Retention).
  • Admin rollback on a graph restores from a prior snapshot. Coarser granularity than artifact rollback by design.
  • Structural mutations initiated by admin or Curator on a graph (e.g., manually forgetting a node, merging two nodes) are tracked as audit events in apollo_audit with before/after snapshot IDs.

Audit log

Every Curator action, Evaluator-driven demotion, drift-hold, upstream artifact re-flag, and admin-chat state mutation writes a record to the Elastic apollo_audit index. The index follows the shared axonis Elastic convention (flat index, UDS shell, expires_ts, delete_by_query cleanup — see §Index mappings and templates and §Retention).

Record schema: ApolloAuditRecord (oracle/curator/audit.py), written via write_audit() and persisted through the ApolloAudit UDS store (oracle/memory/store.py). Beyond the standard UDS/expires_ts envelope it carries:

  • Action axesaction (ActionKind: promote / demote / forget / edit / rollback / compact / drift_hold / upstream_flag / pause_curator / resume_curator), actor (curator_auto, evaluator_auto, or admin:<username>), and trigger (the free-form cause, e.g. evaluator_score_below_threshold, l3_performance_cascade, drift_event, admin_manual, synthesis_proposal).
  • Targetartifact_id, artifact_type, before_version_id / after_version_id, and related_drift_event_id when trigger = drift_event.
  • Scoring — nullable evaluator_score and score_decomposition (per §Evaluator outputs); upstream_artifact_ids when a cascade flagged parents.
  • Explanationrationale (REQUIRED, always present: LLM-synthesized for LLM-driven actions, templated from score_decomposition + trigger for deterministic ones) and evidence_ref (observation ids, graph snapshot id, related audit ids).
  • Lifecycleindefinite (true ⇒ null expires_ts, never purged) and optional admin-supplied admin_note.

Rationale vs. admin_note. rationale is Apollo's own account of why it acted — always present, always auto-generated. admin_note is the admin's own commentary when they take an action — optional, human-supplied. Both are preserved and queryable.

Retention. Default 90 days (configurable via APOLLO_AUDIT_RETENTION_DAYS), enforced by the maintenance task's delete_by_query on expires_ts. Records marked indefinite: true have a null expires_ts and are never deleted — used for critical admin actions (forget of an artifact, pause/resume of Curator, rollback of a versioned artifact). The admin API allows setting indefinite when taking such actions.

Queryable. GET /api/v1/apollo/audit supports filters on time range, action, actor, artifact id, artifact type, trigger, and score-decomposition terms (e.g., "all demotions triggered primarily by L3 errors last 7 days"). Score decompositions let admins see why a score moved without re-deriving it from observations.

Evaluator

The Evaluator scores artifacts based on outcome correlation: after an artifact is published to guidance, do subsequent traces that used it produce better outcomes than traces that did not?

Inputs

  • Raw observations (trace outcomes)
  • Artifact usage records (which artifacts were returned in guidance, which were incorporated)
  • Explicit feedback signals

Failure signals (feeds the evaluator)

An event is considered a failure (negative signal for any artifact associated with its trace) if any of the following:

  1. Layer 3 returned an error (HTTP 5xx or tool exception) — Layer 3 performance signal. Applies to both agent and library observations. Under oracle-sole-observer (§Invariants 14), the observation is emitted by oracle; the signal keys on the envelope's service field (the observed L3 target), not on who performed the HTTP POST.
  2. Output schema mismatched the Layer 1 intent schemaLayer 3 performance signal. Applies only when the observed L3 service is an agent (component_kind == "agent" on its ServiceRegistry record). Libraries have no agent-level intent contract; their outputs are raw CRUD/compute results and schema mismatch is not evaluated for them. The Evaluator looks up component_kind by the envelope's service field at signal-application time — oracle is always the actual emitter, but the service it observed is what the contract keys on.
  3. User feedback was negative (thumbs-down, correction, abandoned conversation)
  4. Self-assessed evaluator confidence was below threshold

All four feed the Evaluator; signal 2 is gated on component_kind per the above. Weights are configurable via APOLLO_EVALUATOR_WEIGHTS.

Layer 3 performance carries amplified penalty

Signals 1 and 2 both reflect Layer 3 performance — what the backend services actually produced when acting on Apollo's guidance. If Layer 3 components are not performing well, that is a strong indication that the workflow generation (Layer 1 prompts) and the artifacts driving that generation need to be updated.

Accordingly, the Evaluator applies an amplified penalty to Layer 3 performance failures:

  • Default weight tiers: L3_performance: 3.0, user_feedback: 1.5, evaluator_confidence: 0.5.
  • Sustained L3 underperformance against a given artifact accelerates the Curator lifecycle:
  • Normal demotion cycle requires N=5 below-threshold evaluation cycles before forget.
  • L3-driven demotion triggers after N=2 cycles when signals 1 or 2 dominate the score. Rationale: if services are reliably failing on an artifact's guidance, waiting out a long demotion window lets bad guidance keep shaping traffic.
  • When an artifact's score degradation is attributable primarily to Layer 3 signals, the Evaluator additionally flags the upstream artifacts — the IntentPattern, PromptShim, or SpecFragment that shaped the Layer 1 prompt which in turn produced the Layer 3 call — for LLM review on the next synthesis trigger. The synthesis LLM may propose edits to those upstream artifacts, creating a cross-layer correction cycle.
  • Repeated L3 failures on the same artifact within a short window escalate to a DriftEvent (not just a score drop), forcing admin review rather than silent demotion.

Weights and thresholds are tunable via env vars (APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N).

Outputs

Per-artifact rolling score (exponential moving average). Scores feed the Curator's demote/forget policies. Scores are visible in admin stats and in the audit log when they trigger actions. Score decompositions (per-signal contributions) are retained so admins can see why a score moved — Layer 3 errors vs. user feedback vs. schema mismatch are distinguishable in the audit trail.

Admin Chat

A conversational interface to Apollo, gated by role admin via oracle's existing guardrails (SPEC-03 §Guardrails).

POST /api/v1/apollo/chat

Request body mirrors oracle's /chat:

{
  "message": "Forget everything Apollo learned about cohort X last week",
  "conversation_id": "apollo_admin_sess_...",
  "model": "default"
}

The admin chat uses Apollo's own LLM (separate from oracle's primary LLM) with a set of memory-management tools:

  • list_memories(filter)
  • get_memory(id)
  • forget_memory(id)
  • promote_artifact(id) / demote_artifact(id)
  • rollback_artifact(id, to_version)
  • rollback_graph(graph_id, to_snapshot)
  • trigger_synthesis(trace_id?)
  • explain_decision(trace_id | artifact_id | audit_id) — returns the rationale + evidence_ref for a Curator action
  • list_decisions(artifact_id?, since?, trigger?) — audit-filtered view of recent Curator actions
  • discuss_decision(artifact_id | audit_id) — opens a focused conversation thread: Apollo's LLM replies with the stored rationale, walks through the evidence (graph snapshot, score decomposition, upstream artifacts), and answers admin follow-ups. The admin can invoke promote/demote/rollback/forget tools inline in the same thread to act on the finding.
  • pause_curator() / resume_curator()

Admin ↔ Apollo conversation

Every Curator action carries a rationale written by Apollo at commit time and persisted in apollo_audit. Admin chat is where those rationales become conversational: an admin asks "why did you just demote pshim_xyz?" and Apollo's LLM retrieves the audit record, reads out the rationale and evidence, and answers follow-ups by re-reading the underlying observations and graph state.

So admin chat is not just a command console — it is the review surface for Apollo's own findings. Admins probe rationales, challenge them, and issue corrections (rollback, forget, edit, pause) without leaving the conversation. Every follow-up action is itself audited to the index with actor: "admin:<username>" and a fresh rationale, preserving the admin's reasoning alongside Apollo's.

Non-admin users cannot reach /chat; their interaction with Apollo is purely transitive, through oracle.

Endpoints

REST (mounted under oracle's /api/v1/apollo/)

Method Path Who Purpose
POST /api/v1/apollo/observations Admin + out-of-process services Admin replay/seed, plus the fallback ingest path for services outside oracle's MCP dispatch reach. Phase-1 emitters (oracle + cortex) do not use this endpoint — oracle emits on their behalf in-process (§Ingest Semantics).
GET /api/v1/apollo/guidance?scope=l1 Admin Preview current L1-scoped artifact set
GET /api/v1/apollo/guidance?scope=l3:<service> Admin Preview current L3-scoped artifact set for an agent
GET /api/v1/apollo/guidance/schemas Admin Inspect learned intent schemas
GET /api/v1/apollo/guidance/tools Admin Inspect tool descriptions / routing hints
GET /api/v1/apollo/guidance/specs Admin Inspect spec fragments
GET /api/v1/apollo/guidance/connections Admin Inspect service-connection hints
GET /api/v1/apollo/guidance/stream?scope=<scope> Admin Real-time SSE feed of Curator commits (debugging only)
POST /api/v1/apollo/chat Admin Conversational admin interface
GET /api/v1/apollo/memories Admin List observations with filters
GET /api/v1/apollo/memories/{id} Admin Inspect one observation
POST /api/v1/apollo/memories Admin Seed an observation manually
PATCH /api/v1/apollo/memories/{id} Admin Edit metadata (tags, notes)
DELETE /api/v1/apollo/memories/{id} Admin Forget
GET /api/v1/apollo/artifacts Admin List learned artifacts
GET /api/v1/apollo/artifacts/{id} Admin Inspect one artifact + version history
PATCH /api/v1/apollo/artifacts/{id} Admin Edit
POST /api/v1/apollo/artifacts/{id}/promote Admin Promote
POST /api/v1/apollo/artifacts/{id}/demote Admin Demote
POST /api/v1/apollo/artifacts/{id}/rollback Admin Revert to a prior version
DELETE /api/v1/apollo/artifacts/{id} Admin Forget
GET /api/v1/apollo/audit Admin Query audit log
POST /api/v1/apollo/learn Admin Manually trigger an Apollo synthesis pass
GET /api/v1/apollo/stats Admin Apollo's own observability (counts, timings, scores)

MCP (admin chat tools)

Apollo's MCP tools mirror the admin CRUD surface, exposed only to Apollo's own admin chat LLM (§Admin Chat) — not aggregated into oracle's user-facing /agentspace MCP catalog. The tools are served from a private MCP endpoint mounted by oracle.oracle.chat.server and reachable only through the admin-chat conversation; they are never visible to L1 or L3 LLMs.

  • apollo_list_memories, apollo_get_memory, apollo_forget_memory
  • apollo_list_artifacts, apollo_get_artifact, apollo_promote_artifact, apollo_demote_artifact, apollo_rollback_artifact, apollo_forget_artifact
  • apollo_list_graphs, apollo_get_graph_snapshot, apollo_rollback_graph
  • apollo_query_audit
  • apollo_trigger_synthesis
  • apollo_list_decisions, apollo_explain_decision, apollo_discuss_decision
  • apollo_pause_curator, apollo_resume_curator
  • apollo_stats

Authentication & Authorization

  • Admin endpoints require admin role via oracle's OAuth middleware + guardrails (SPEC-03).
  • Guidance GET endpoints are admin-only. L1 and L3 never call them. They exist for admin inspection of what Apollo would currently inject.
  • Secondary ingest path (POST /api/v1/apollo/observations) accepts either the admin's Bearer token (for replay/seed) or, for any out-of-process emitter, the user's forwarded Bearer token — the same token oracle forwards downstream in its existing cross-service calls. There is no service-token infrastructure in axonis-core today; every cross-service call in the stack forwards the user's Keycloak-issued token (verified end-to-end against JWKS). Admin replay/seed additionally requires the admin role. Phase-1 emitters do not exercise this path.
  • Oracle's primary in-process path (all L1-relayed events + oracle's own llm_turn + oracle-observed L3 tool_output / tool_error + final_response) bypasses network auth — it is a direct function call within the same process, already authenticated at the ingress by OAuthMiddleware.
  • Neither L1 nor L3 authenticates to Apollo — neither layer addresses Apollo on any path (ingest or guidance). Both talk to oracle; oracle handles Apollo (§Invariants 14).
  • Injection channel (response-attached) rides the ambient auth of the envelope it is embedded in. The /chat response is already authenticated per the inbound /chat request; the outbound MCP dispatch is already authenticated per oracle's forwarded token. No additional auth layer is introduced for attached guidance.
  • Admin SSE debug feed uses the same OAuthMiddleware on connection handshake and is gated to the admin role.
  • Apollo honors all oracle guardrails. Curator cannot widen a caller's tool access. Attached guidance that references tools a subscriber cannot use is filtered out before the envelope is serialized.
  • Deferred: once a Keycloak client-credentials grant is introduced for service-to-service auth (noted in SPEC-03 as pending), APOLLO_SERVICE_TOKEN-authenticated ingest from background/batch workers becomes possible. Until then, ingest without a user token context is not supported.

Ingest Semantics

Observation ingest has two paths. The primary path, used by every Phase-1 emitter (oracle + cortex), is in-process only — oracle observes the envelopes flowing across its own boundaries and calls oracle.oracle.observer.ingest directly. The secondary path is the HTTP POST endpoint, mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.

Primary path: in-process emission by oracle

Per §Invariants 14, neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter in production. On every inbound /chat request, oracle extracts L1 signals from the request body and calls the observer in-process. On every outbound MCP dispatch, oracle observes the round-trip and emits in-process on the L3 service's behalf:

Event(s) Emitted when Emitter call site
intent_schema, user_prompt, user_feedback /chat request arrives or a feedback submission is posted oracle/server/api/routes.py
llm_turn oracle's own LLM request/response cycle completes oracle/server/llm/tool_executor.py
tool_output, tool_error an outbound MCP dispatch to an L3 service returns oracle/server/llm/tool_executor.py + oracle/server/mcp/server.py (proxy path)
final_response oracle is about to return the /chat response body oracle/server/api/routes.py

All emissions flow through helpers in oracle/oracle/hooks/chat.py which enqueue the envelope on the in-process async queue via oracle.oracle.observer.ingest.ingest(...). No network call. No authentication layer (the helpers live inside oracle's process, authenticated at the ingress by OAuthMiddleware). Failure modes are purely local: a full queue increments apollo_ingest_queue_dropped_total; an observer exception is caught and logged so the user request is unaffected.

Secondary path: HTTP POST (admin replay + out-of-process services)

The POST /api/v1/apollo/observations endpoint remains mounted on oracle's Starlette app for two use cases:

  1. Admin replay/seed — an admin manually re-ingests observations (e.g., to backfill after an outage or to seed synthetic test data). Requires the admin role.
  2. Services outside oracle's MCP dispatch reach — any future service whose outputs are not observable through an oracle-mediated MCP round-trip can emit via ApolloClient. None of the Phase-1 emitters use this path.

Endpoint:

POST /api/v1/apollo/observations
Content-Type: application/json
Authorization: Bearer <user-token>         # admin token for replay, or the user's forwarded token for out-of-process services
traceparent: 00-<trace-id>-<parent-span-id>-<flags>

{ "observations": [<envelope>, ...] }

A single envelope is always valid; the array form enables batching on the client. Apollo responds 202 Accepted as soon as every envelope is placed on the in-process queue. Per-envelope validation happens inside the background worker and is logged (not bubbled to the caller) so a single bad envelope does not fail a batch.

The HTTP POST is a fire-and-accept call. Because Apollo's request handler does nothing but enqueue, the server-side operation is a local memory write — never a WAN hop inside the request. Client-side timeouts can therefore be generous (default 30 s) without risking silent drops from network jitter: the handler always responds in sub-millisecond time on a healthy Apollo.

Client-side helper: ApolloClient

ApolloClient in axonis-core is the HTTP client used by the secondary path. Phase-1 services (oracle + cortex) do not import it — oracle emits in-process and cortex emits nothing at all. ApolloClient is retained so admin tooling and any future out-of-process emitter can reach the endpoint without a bespoke HTTP client.

ApolloClient.emit(envelope) does a single httpx.AsyncClient.post with:

  • A generous request timeout (APOLLO_INGEST_POST_TIMEOUT_SEC, default 30).
  • Bounded retries with exponential backoff + jitter on transient failures (APOLLO_INGEST_RETRY_ATTEMPTS, default 2; APOLLO_INGEST_RETRY_BASE_MS, default 200; APOLLO_INGEST_RETRY_CAP_MS, default 2000). Transient = timeout, 5xx, 429, connection error. 4xx except 429 is not retried.
  • Client-side batching via a size-or-interval hybrid: APOLLO_INGEST_BATCH_SIZE (default 50) or APOLLO_INGEST_FLUSH_INTERVAL_MS (default 500), whichever first.
  • Lifecycle flush on process shutdown (signal handler + atexit) and on explicit ApolloClient.flush() calls.

ApolloClient is pure HTTP — the same shape as axonis-core's RestClient and MCPClient (axonis_core/gateway/). No new transport primitive is introduced.

Server side: in-process async queue

Apollo's ingest handler is thin: it parses envelopes, hands each to ingest() (oracle/observer/ingest.py), and returns 202 as soon as every envelope is enqueued. ingest() never blocks the caller — it uses put_nowait and records a drop (apollo_ingest_queue_dropped_total) instead of awaiting backpressure; accepted envelopes increment apollo_ingest_accepted_total. Both the in-process path (oracle calls ingest() directly) and the HTTP path (the oracle.guidance.api route handler) funnel through the same entry point.

The queue is bounded by APOLLO_INGEST_QUEUE_MAXSIZE (default 10000). When the queue fills, put_nowait raises QueueFull and Apollo increments apollo_ingest_queue_dropped_total — the failure is never silent, visible on /stats under degraded_emitters.

A pool of background worker coroutines (APOLLO_INGEST_WORKER_CONCURRENCY, default 4) drains the queue. Each worker performs the full ingest: normalize → write to apollo_observations → update graphs → dispatch synthesis triggers per §Learner. Worker failures are logged and the envelope is reprocessed on a bounded retry budget (APOLLO_INGEST_WORKER_RETRY_ATTEMPTS, default 2) before being moved to a dead-letter log (APOLLO_INGEST_DEAD_LETTER_PATH, optional JSONL file; unset by default).

Failure visibility

No silent failure modes exist on the ingest paths — primary (oracle in-process) and secondary (HTTP POST). Every failure kind is counted. The {service} label is the envelope's service field — the observed L3 target for Phase-1 emissions (oracle is the actual emitter but per-service visibility is what operators need).

Metric Meaning
apollo_ingest_accepted_total{service} Envelopes successfully enqueued (both paths)
apollo_ingest_queue_dropped_total{service} Envelopes rejected because the queue was full (both paths)
apollo_ingest_post_failure_total{service, kind} Secondary-path POST failures after retries exhausted (timeout / 5xx / etc.). Never fires for Phase-1 emitters (they go in-process).
apollo_ingest_worker_failure_total{service} Background-worker failures after retries (moved to dead-letter) — applies to both paths
apollo_ingest_queue_depth Current depth of the in-process queue
apollo_ingest_last_ingest_ts{service} Timestamp of last successful enqueue per service — covers both oracle's in-process call and secondary-path POSTs
apollo_ingest_last_drain_ts{service} Timestamp of last successful worker drain per service

Services whose last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300) for a service that should be active are surfaced on /stats under degraded_emitters. For Phase-1 services, "degraded" means oracle stopped observing them (e.g., oracle hasn't dispatched an MCP call to cortex in five minutes) — not that a POST failed.

Dedup on at-least-once delivery

Client retries can produce duplicate envelopes. Apollo's observer deduplicates on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC (default 300) before writing to Elastic.

Config knobs (all prefixed APOLLO_)

Env var Default Purpose
APOLLO_INGEST_BATCH_SIZE 50 Max envelopes per POST body
APOLLO_INGEST_FLUSH_INTERVAL_MS 500 Max time an envelope waits in client buffer before flushing
APOLLO_INGEST_POST_TIMEOUT_SEC 30 Per-POST HTTP timeout — generous, since the server handler is in-memory only
APOLLO_INGEST_RETRY_ATTEMPTS 2 Bounded client retries on transient failure
APOLLO_INGEST_RETRY_BASE_MS 200 Base delay for exponential backoff
APOLLO_INGEST_RETRY_CAP_MS 2000 Max delay between retries
APOLLO_INGEST_QUEUE_MAXSIZE 10000 Server-side in-process queue capacity
APOLLO_INGEST_WORKER_CONCURRENCY 4 Number of background worker coroutines draining the queue
APOLLO_INGEST_WORKER_RETRY_ATTEMPTS 2 Bounded worker retries before dead-letter
APOLLO_INGEST_DEAD_LETTER_PATH unset Optional JSONL path for envelopes moved to dead-letter after worker retries exhausted
APOLLO_INGEST_STALE_WARN_SEC 300 Seconds without a successful POST before an expected-active service is flagged
APOLLO_INGEST_DEDUPE_WINDOW_SEC 300 Window for (trace_id, event_type, timestamp, service) dedupe on at-least-once delivery

Layer 1 Intent Schema Obligation

Layer 1 is expected but not required to emit an intent_schema observation with each request. The obligation is best-effort throughout Phase 1 and Phase 2, with a configurable path to required once Layer 1's schema contracts stabilize.

Best-effort mode (default)

  • Layer 1 SHOULD include an intent_schema block in every /chat request body it sends to oracle. Oracle extracts the block and emits the intent_schema observation to Apollo in-process (§Invariants 14 — L1 never addresses Apollo). A request without the block is still served; oracle simply emits no intent_schema observation for that trace.
  • If a schema is present on a trace, graph nodes are typed explicitly and the schema_mismatch failure signal (§Evaluator signal 2) is active for that trace.
  • If a schema is absent, Apollo's extractors fall back to prompt-inference and mark the resulting nodes inferred=true. Drift detection and evaluator confidence weight inferred nodes lower. The schema_mismatch signal is not evaluated for that trace; the L3-performance penalty (§Evaluator) still fires on signal 1 (hard errors), but signal 2 is dark.
  • GET /api/v1/apollo/stats reports intent_schema_coverage — percentage of traces with a Layer 1 schema in the last rolling window — so admins can see when Layer 1 coverage is high enough to flip to required.

Required mode

  • APOLLO_REQUIRE_INTENT_SCHEMA=true flips behavior: oracle rejects inbound /chat requests whose body lacks an intent_schema block with a 400 at the ingress — L1 is the direct caller and sees the rejection. Traces without a schema are never created; nothing to drop at the Observer layer.
  • The flip is a config change, not a code change. No Apollo, oracle, or L1 redeploy is needed — but Layer 1's /chat emission behavior must already include the schema or the flip will start rejecting real traffic.
  • Phase 3 is the expected time to flip, once Curator empowerment demands the cleaner signal. Admin can flip earlier if stats show high coverage.

Logging

Every Apollo module and every service participating in Apollo's observation / injection loop uses the axonis-core logger rather than a module-local logging.getLogger() call. The logger module is axonis.logger — the axonis-core equivalent of the athena logging utility at athena/athena/logger.py. Both implement the same three-logger convention (log, error, audit) with identical handler shapes, so logs from any component read coherently when aggregated.

Three loggers, three audiences

Logger When to use Destination
log Normal operational telemetry — info, warning, debug. Console + axonis.log
error Exceptions, permanent failures, data-loss events, misconfiguration. Console + error.log
audit Important transactions that must be traceable independently of volume. audit.log (file only)

Import pattern (oracle, axonis-core, beacon, cortex):

from axonis.logger import log, error, audit

Services that already depend on athena (e.g., athena itself) use from athena.logger import log, error, audit instead. The API and format are identical between the two.

What counts as audit-worthy

Apollo MUST route the following transactions through the audit logger so they leave a trail in audit.log separate from regular operational noise:

  • Worker pool start / shutdown / cancellation (§Ingest Semantics).
  • Graph snapshot completion (§Snapshots and trajectory) — per hour.
  • Every Curator action — promote, demote, forget, edit, rollback, compact, drift-hold, upstream-flag, pause_curator, resume_curator. (Complementary to the apollo_audit Elastic index: the audit log captures the event as structured text alongside other platform audit events; the Elastic index is the queryable, structured source of truth.)
  • LLM synthesis proposals that result in a Curator commit (the proposal → drift-check → commit boundary).
  • DriftEvent creation (§Graph-anchor drift check).
  • Admin chat actions that mutate state, logged with actor: "admin:<username>".
  • Guidance injection commits — every push of apollo_guidance onto an outbound envelope is audit-worthy at the commit level, though the per-turn attachment is a delivery detail (not audited).
  • Subscriber connection / disconnection events on the admin SSE debug feed.

What stays in log / error

  • Per-observation ingest (log.info / log.debug) — too high-volume for audit.
  • Per-attach-turn emissions — ditto.
  • Retry attempts, transient failures — log.warning.
  • Timeouts on the attach path (graceful degradation) — log.warning.
  • Queue overflow, exhausted retries, worker failures — error.
  • Exceptions swallowed by the hot path — error so they still land in error.log without propagating into the request path.

Rationale

Splitting the three channels keeps audit.log the single place an operator or admin-chat tool can scan when investigating a system-level state change without being drowned in routine telemetry. Separating error.log keeps every permanent-failure signal (data loss, persistent outage, contract violation) in one place regardless of which module it came from.

Failure Posture

  • Apollo slow on attach: the in-process guidance fetch is bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms). On overshoot, oracle serializes the /chat response or MCP dispatch without apollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. User sees no failure; metric apollo_guidance_attach_timeout_total surfaces the event.
  • Apollo unreachable as a process: since Apollo is a package inside oracle, "Apollo unreachable" means oracle is itself broken, which is a larger incident. If the Apollo module fails to import or initialize at startup, oracle continues serving /chat and tool dispatches without apollo_guidance attached. Ingest endpoint returns 503.
  • Ingest queue full: POST /api/v1/apollo/observations responds 202 but increments apollo_ingest_queue_dropped_total{service}. Never silent — visible on /stats under degraded_emitters.
  • Ingest client POST fails: client retries within budget, then drops the batch and increments apollo_ingest_post_failure_total{service, kind}. Visible on /stats. Emitter's task continues unaffected (observations are telemetry, not transactional).
  • Apollo worker crashes mid-ingest: at-least-once redelivery from the asyncio queue; observer dedupes on (trace_id, event_type, timestamp, service).
  • Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation on subsequent observations and demotes; the next attached payload reflects the demotion. Admin can force-rollback via audit log at any time.
  • Curator goes rogue: every action (mutation + commit) is in the audit log; admin can pause_curator() immediately via chat or CLI. Paused Curator → artifact set stops changing → attached payloads continue to reflect the as-of-pause state until resume.

Apollo's LLM

Apollo runs its own LLM, separate from oracle's user-facing LLM routing. Apollo's LLM is the primary driver of synthesis, invoked per event (see §Learner → LLM synthesis).

Model: pluggable by design

The model is selected by configuration and must remain swappable without code changes. Apollo's LLM provider layer normalizes across providers so that a newer, stronger model can replace the current one as the state of the art advances.

Current default: MiniMax M2.7.

It is the best-available fit at the time of this spec given its context window, cost profile, and availability — but the spec is deliberately agnostic. Apollo must not encode MiniMax-specific assumptions in prompt shapes, input formats, or response parsers. The provider layer handles any per-model translation.

Operators can swap the model by changing env vars only:

APOLLO_LLM_PROVIDER=minimax            # current default; swap with any provider registered in the router
APOLLO_LLM_MODEL=m2.7                  # current default; replace with a newer model when available
APOLLO_LLM_API_KEY=...
APOLLO_LLM_BASE_URL=...                # for self-hosted or proxied inference
APOLLO_SYNTHESIS_MAX_CONCURRENT=4      # cap on concurrent synthesis calls (event bursts from L3)

The LLM router (oracle/oracle/llm.py) must support MiniMax as a first-class provider alongside anthropic / openai / groq / trinity / ollama in oracle's existing router. New providers register through the same interface — adding a model is an additive router change, never a change to Apollo's business logic.

Local MiniMax via HuggingFace (native, pre-trained, on-disk)

Apollo's LLM layer reserves a provider slot for a locally-stored, HuggingFace-pulled MiniMax model, selected with APOLLO_LLM_PROVIDER=minimax-local (see §Environment Configuration). It complements the default OpenAI-compatible endpoint path for air-gapped clusters, own-GPU inference, or operator-owned fine-tunes; the hosted path (openai provider) is unchanged and remains the default.

Canonical HuggingFace load signature. The provider MUST honor the model card's canonical from_pretrained call shape with trust_remote_code=True — required because MiniMax ships its own tokenizer and modeling code alongside the weights (a custom fine-tune needs the flag too):

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)

Apollo does not parse or override HuggingFace's on-disk cache layout (${HF_HOME:-~/.cache/huggingface}/hub/...). Pre-pulling the checkpoint (during image build or via an init container) is the recommended production pattern — the first cold from_pretrained downloads tens of gigabytes, unacceptable on the request path. MiniMax-M2.7 is large: plan for tens of GB on disk and a GPU/multi-GPU node with VRAM for the resolved context window; deployments that can't meet that budget stay on the hosted path.

Operator knobs (implemented). APOLLO_LLM_LOCAL_MODEL_PATH loads weights from outside the HF cache (passed verbatim as from_pretrained's first argument; unset → the stock MiniMaxAI/MiniMax-M2.7 checkpoint). APOLLO_LLM_LOCAL_DEVICE_MAP, APOLLO_LLM_LOCAL_TORCH_DTYPE, and APOLLO_LLM_LOCAL_LOAD_IN_4BIT / APOLLO_LLM_LOCAL_LOAD_IN_8BIT are additive from_pretrained overrides (4-bit wins if both bitsandbytes flags are set; an unrecognized dtype or a missing torch is logged and ignored rather than fatal).

Ships today vs. deferred. The minimax-local provider is implemented in oracle/oracle/llm.py (Milestone 8): lazy transformers import, the canonical load signature above, the operator knobs just listed, and prompt completion where weights and deps are present. Still deferred (tracked in the dev-plan): thread/process-pool offload of the synchronous forward pass (today it runs inline on the event loop); pre-pull orchestration + readiness gate; token streaming through the provider abstraction. Until those land, minimax-local is a dev / air-gapped-lab fallback and the production default stays APOLLO_LLM_PROVIDER=openai against an OpenAI-compatible MiniMax endpoint.

Separation from oracle's user-facing LLM

Apollo's LLM configuration is independent of oracle's user-facing LLM routing. The two can use the same provider or different providers; the same model or different models. Apollo's usage is tracked separately via the Meter (SPEC-03 §Metering) under client id apollo. User-facing chat metrics and Apollo metrics are separate in dashboards.

Budget isolation

A burst of synthesis calls triggered by a long fusion run must not starve user-facing chat. Apollo's LLM has its own rate limit, its own quota, and its own metering client id. When Apollo's quota is exceeded it defers synthesis (the event queue holds triggers up to a cap); user-facing oracle chat is unaffected.

Drift Prevention

Apollo influences a large fraction of the system. Drift in its artifacts cascades into the prompts of layer 1 and the outputs of layer 3. The spec encodes several anti-drift guarantees:

  1. Observation cadence is fixed and coarse. No per-token events. High-signal-to-noise ratio in the raw data.
  2. Graphs are the deterministic anchor. Decision graphs update on every observation via rule-based extractors — they never create free-form artifacts and cannot drift on their own. They record what actually happened, not what the LLM thinks happened.
  3. Every LLM output is checked against the graphs. The LLM is the primary driver of synthesis, but every artifact, promotion, and pattern it proposes is validated against current graph state and trajectory before the Curator commits it. Proposals that diverge from graph-recorded reality are flagged as DriftEvent and held for admin review.
  4. Drift is detected structurally, not rate-limited. Short-window vs. long-window edge-weight divergence, rate-of-new-nodes caps, LLM-output-vs-graph divergence, and trajectory breaks distinguish smooth evolution (allowed) from sudden shift (flagged). Apollo can learn continuously because the graphs provide a rigid referent.
  5. Curator is bounded. Cannot touch auth, guardrails, or user data — only Apollo's own artifacts.
  6. Every Curator action is auditable. Admin can see what changed, when, why, and by whom.
  7. Every artifact is versioned. Rollback is always possible.
  8. Evaluator closes the loop. Artifacts that stop correlating with good outcomes decay automatically.
  9. Admin can pause the Curator. An emergency off-switch prevents runaway mutation.
  10. Guidance degrades gracefully. If Apollo is slow or wrong, oracle falls through without injection — the base system still functions.

Environment Configuration

Apollo does not redefine any env var that already exists in the platform deployment layer. The canonical source for deployment-level configuration is developers-environment/conf/*.env — one file per target (development.axonis.ai.env, matrix.axonis.ai.env, edge.axonis.ai.env, vector.axonis.ai.env, etc.). Every target ships a consistent platform baseline; Apollo inherits it transitively through axonis-core, oracle, and its own storage/logger dependencies.

Inherited platform variables (not Apollo's to define)

Variable(s) Consumer Apollo's use
ELASTIC_HOST, ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_VERIFY, ELASTIC_TIMEOUT, ELASTIC_TEMPLATES, ELASTIC_PKI_CA axonis.elastic.Elastic Storage for apollo_observations, apollo_artifacts, apollo_graph_*, apollo_audit. Every Memory(UDS) subclass in apollo/memory/store.py inherits this config.
REDIS_URL (oracle-style) or REDIS_HOST + REDIS_PORT + REDIS_PASSWORD + REDIS_DB + REDIS_TLS + REDIS_VERIFY (platform-standard) oracle/server/memory/*, axonis.redis.Redis Oracle's ConversationStore + CrossServiceMemory; unused directly by Apollo.
SSO_CLIENT_ID, SSO_CLIENT_SECRET, SSO_TOKEN_URL, SSO_WELLKNOWN, SSO_INTROSPECT_URL, SSO_VERIFY oracle's OAuthMiddleware (+ axonis.auth) Validates Bearer tokens on every request reaching /api/v1/apollo/*. No Apollo-specific auth config.
ATLAS_LOG_LEVEL, ATLAS_WORKSPACE, AXONIS_LOG_LEVEL, AXONIS_WORKSPACE axonis.logger (§Logging) Log level + log-file root for Apollo's three logger streams (log/error/audit). oracle/tests/conftest.py also respects ATLAS_WORKSPACE for test-session log placement.
FEDERATE_DOMAIN, FEDERATE_NAME, FEDERATE_UUID, FEDERATE_PARTY_*, FEDERATE_PROTOCOL_*, FEDERATE_WORK_MODE_* axonis.uds federation hooks Picked up automatically if/when Apollo artifacts start federating (post-Milestone 13). No Apollo-specific federation config.

Apollo-owned variables (all APOLLO_*)

Canonical location: developers-environment/conf/*.env — specifically the shared dev-env file (development.axonis.ai.env) plus any target-specific overrides (matrix.axonis.ai.env, vector.axonis.ai.env, edge.axonis.ai.env). Every APOLLO_* variable is declared there with a production-ready default. oracle/oracle/settings.py reads them via os.getenv(...) with fall-back defaults that match the env-file values, so if the shared env is unsourced the system still comes up sensibly — but the authoritative source is the deployment env file.

Why it lives in the shared env file rather than per-service: Apollo's observation path runs in oracle, but its configuration surface informs the contract every other service consumes (guidance attach budgets, trace-propagation expectations, retention windows). Keeping the defaults in the shared env file means oracle, parallax, cortex, and beacon all load the same baseline — an operator flipping APOLLO_CURATOR_AUTONOMOUS=true in the shared file affects the whole deployment consistently.

Every APOLLO_* variable Apollo's settings.py reads is mirrored in the env file. Grouped by subsystem:

  • LLM: APOLLO_LLM_PROVIDER, APOLLO_LLM_MODEL, APOLLO_LLM_API_KEY, APOLLO_LLM_BASE_URL; local-MiniMax knobs APOLLO_LLM_LOCAL_MODEL_PATH, APOLLO_LLM_LOCAL_DEVICE_MAP, APOLLO_LLM_LOCAL_TORCH_DTYPE, APOLLO_LLM_LOCAL_LOAD_IN_4BIT, APOLLO_LLM_LOCAL_LOAD_IN_8BIT
  • Synthesis: APOLLO_SYNTHESIS_MAX_CONCURRENT
  • Ingest client side: APOLLO_INGEST_BATCH_SIZE, APOLLO_INGEST_FLUSH_INTERVAL_MS, APOLLO_INGEST_POST_TIMEOUT_SEC, APOLLO_INGEST_RETRY_ATTEMPTS, APOLLO_INGEST_RETRY_BASE_MS, APOLLO_INGEST_RETRY_CAP_MS
  • Decision Graphs: APOLLO_GRAPH_SNAPSHOT_INTERVAL, APOLLO_GRAPH_EWMA_SHORT, APOLLO_GRAPH_EWMA_LONG, APOLLO_GRAPH_TRACE_STATE_TTL_SEC
  • Evaluator: APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N, APOLLO_EVALUATOR_NORMAL_DEMOTE_N
  • Curator: APOLLO_CURATOR_AUTONOMOUS, APOLLO_CURATOR_AUTO_INTERVAL_SEC
  • Maintenance: APOLLO_MAINTENANCE_INTERVAL, APOLLO_OBSERVATION_RETENTION_DAYS
  • Trace propagation: APOLLO_TRACE_HEADER, APOLLO_REQUIRE_TRACEPARENT
  • Observation obligations: APOLLO_REQUIRE_INTENT_SCHEMA
  • Integration (ApolloClient): APOLLO_BASE_URL

None of these duplicate a platform variable. When adding a new APOLLO_*, add it to both oracle/oracle/settings.py (with its default) and the shared env file (with the same default) in one commit.

Per-deployment overrides

Each *.env in developers-environment/conf/ targets a specific deployment (development, matrix, vector, edge, etc.). The shared development.axonis.ai.env holds the baseline; production targets override via their own file. Any Apollo variable that needs to differ per target lives in the target-specific env — never hardcoded into settings.py. Operators change behavior by editing the env file and reloading, not by shipping code.

Dependencies

Apollo declares no new top-level dependencies — every library it needs is already in oracle's oracle/pyproject.toml. It inherits oracle's stack (axonis-core, FastAPI/Starlette, redis, the anthropic and openai provider SDKs) and activates two libraries oracle already ships but that are otherwise dormant: sentence-transformers (embeddings) and numpy (dense-vector math).

LLM provider SDK. Apollo's current default LLM is MiniMax M2.7 (see §Apollo's LLM). MiniMax exposes an OpenAI-compatible API, so Apollo reaches it via the existing openai client with APOLLO_LLM_BASE_URL pointed at the MiniMax endpoint — no new SDK dependency is added. If a future model swap requires a non-OpenAI-compatible provider, an additive dependency joins oracle's existing provider set.

Invariants

  1. Apollo does not execute workflows. It observes, learns, and advises. It never calls tools, never invokes backend services, never retries a failed request. Layer 1 drives iteration.
  2. Curator empowerment is bounded to Apollo's own artifacts. Curator cannot change auth, guardrails, token scopes, user conversations, or any non-Apollo state.
  3. Every autonomous action is auditable. No Curator mutation occurs without a record in apollo_audit.
  4. Apollo is internal. No Apollo endpoint is exposed outside the cluster except through oracle's existing external surface. Oracle remains the only externally exposed service (SPEC-03 invariant 1).
  5. Apollo failures do not break oracle. Guidance attachment has a hard timeout (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS); on overshoot or internal Apollo failure, oracle serializes the response / MCP dispatch without apollo_guidance. Ingest failures never block the emitter's task and are surfaced as metrics (apollo_ingest_post_failure_total, apollo_ingest_queue_dropped_total) on /stats — never silent.
  6. Apollo uses axonis-core's Memory UDS as its storage primitive. It does not re-implement or bypass the UDS pattern from SPEC-01.
  7. Admin chat is the only conversational surface. Role admin is required. Non-admin users interact with Apollo only transitively via oracle.
  8. Observation cadence is coarse by design. Token-level observations are prohibited. Turn-level, tool-level, error-level, and response-level only.
  9. Axonis-core remains ML-free. Any future ML dependencies (e.g., embedding generation) live in oracle/oracle/, not in axonis-core. SPEC-01 invariant 1 is preserved.
  10. Artifacts are versioned; graphs are snapshotted. Every Curator mutation to an artifact creates a new version in apollo_artifact_history; graph-level rollback uses the hourly/daily/weekly snapshot tiers. Rollback is always possible.
  11. Oracle's existing memory modules are not modified. Apollo is additive and coexists with oracle/server/memory/* and oracle/server/models/memory.py throughout all three phases. Consolidation of the oracle memory modules is deferred and out of scope for this spec (tracked in the dev-plan).
  12. Apollo's LLM is pluggable. No MiniMax-specific assumptions in prompts, input shapes, or response parsers. Model swap is a config change via APOLLO_LLM_PROVIDER / APOLLO_LLM_MODEL, never a code change.
  13. Layer 3 performance is the strongest failure signal. Evaluator weighting amplifies L3 errors and schema mismatches over softer signals, accelerates demotion on L3-dominant score drops, and cascades to flag upstream artifacts for synthesis review.
  14. Neither L1 nor L3 addresses Apollo directly. L1 talks to oracle; L3 talks to oracle (via MCP); oracle talks to Apollo. L1 and L3 hold no Apollo endpoint knowledge, no Apollo credentials, and make no Apollo calls on any in-production path. Oracle is Apollo's sole emitter for all Phase-1 events: L1-origin observations (intent_schema, user_prompt, user_feedback) are emitted by oracle in-process after oracle receives the corresponding signal from L1; L3-origin observations (tool_output, tool_error) are emitted by oracle in-process after the MCP round-trip to an L3 service returns. Guidance flows the same way in reverse: it reaches L1 attached to /chat responses, reaches oracle's own chat LLM in-process (no transport, since oracle hosts Apollo), and reaches L3 attached to outbound MCP tool dispatches. The POST /api/v1/apollo/observations endpoint exists as a secondary path for admin replay/seed and for future services running outside oracle's MCP dispatch reach; Phase-1 emitters do not use it. No long-lived connections, no service tokens, no push channel in production.
  15. Injection cannot execute code in any subscriber. Attached apollo_guidance (or in-process cache contents on the L2 path) carries artifact data only. Subscribers update a local cache and consult it on their next LLM turn. Apollo cannot force a subscriber to act, call a tool, or mutate any state beyond its own cache.
  16. No subscriber registry, no push channel. Apollo has no list of subscribers to push to. Guidance is delivered by oracle attaching the current applicable set to every response/dispatch leaving oracle (L1 attach, L3 attach) and consulted in-process by oracle's own chat LLM on the L2 path. L3 agent eligibility is still governed by component_kind on the ServiceRegistry record (libraries are filtered out before attachment); L1 eligibility is implicit (every /chat response carries L1 guidance); L2 consumption is implicit (oracle's tool-executor consults the local cache before every turn).
  17. Apollo is the cross-service knowledge transfer channel. MemoryService (axonis-core) is strictly per-service — every recall is scoped to the calling service's (user_id, service). A preference, fact, or instruction expressed to one service is never directly readable by another. When the same intent needs to shape behaviour across services (e.g. "user prefers concise responses" expressed to beacon should also bias oracle), Apollo's observation stream picks it up, synthesis distills it into an artifact (e.g. a PromptShim with applicability.service_name = null for cross-service scope), and the guidance attach channel delivers it to every applicable subscriber. Apollo never instantiates MemoryService for cross-service reads — its view is the observation stream, which inherently spans all services. This separation means cross-service knowledge transfer is always curated, audited, and reversible (demote / forget) rather than implicit through silent shared-index reads.

Test Expectations

  • Observer tests: each event type round-trips correctly through ingest; trace_id and parent_trace_id stitching works; cadence limits are enforced (no token-level events accepted); every Phase-1 event — L1-origin (intent_schema, user_prompt, user_feedback), oracle's own (llm_turn, final_response), and L3-origin (tool_output, tool_error) — arrives via oracle's in-process emission path only. Oracle extracts L1 signals from /chat request body and feedback submissions, observes the MCP round-trip for L3 outputs, and calls oracle.oracle.observer.ingest in-process on both layers' behalf. A direct emit from L1 credentials or from cortex to any Apollo path is rejected in Phase-1 test fixtures (§Invariants 14).
  • HTTP ingest tests (secondary path): the POST /api/v1/apollo/observations endpoint continues to function for admin replay/seed and for out-of-process emitters. ApolloClient.emit POSTs the envelope and traceparent with an appropriate Bearer token (admin token for replay, user-forwarded token for out-of-process emitters); server returns 202 as soon as the envelope is enqueued on the in-process async queue; queue overflow increments apollo_ingest_queue_dropped_total and is visible on /stats (never silent); client retries on transient failures within the configured attempt budget, then surfaces apollo_ingest_post_failure_total on permanent failure; at-least-once redelivery is deduped on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC; background worker crashes move envelopes to the optional dead-letter JSONL path when retry budget exhausts; services over lag/staleness thresholds appear in degraded_emitters.
  • Memory tests: observations, artifacts, graph nodes, graph edges, and graph snapshots indices support CRUD via the axonis-core Elastic base class; embeddings generated on store; semantic recall via kNN composes with filters; expires_ts + delete_by_query maintenance task coarsens and purges correctly.
  • Graph update tests: extractors are deterministic on every observation; node/edge upserts are idempotent; EWMA short- and long-window weights update correctly; no artifacts are created on the deterministic path.
  • Synthesis tests: each event-driven trigger (L1 request, L3 output, admin chat, guidance miss, admin-initiated) invokes the LLM once; concurrent synthesis is bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT; duplicate triggers within a trace_id are coalesced to the latest observation; synthesis calls receive the correct subgraph and artifact context.
  • Graph-anchor drift check tests: LLM proposals consistent with graph state are committed; proposals that contradict strongly-weighted edges are flagged as DriftEvent and held for admin review; rate-of-new-nodes cap triggers drift flagging.
  • Guidance tests: intent → artifacts matching; layer filtering; caller-permission filtering (guardrails); empty-result fallback when artifacts index is empty; 50 ms timeout on the hot path.
  • Evaluator tests: all four failure signals detected; L3 performance signals (1 and 2) weight heavier than user feedback and confidence; accelerated demotion (N=2) fires on L3-dominant score drops; upstream artifact re-flag cascade reaches IntentPattern / PromptShim / SpecFragment; repeated L3 failures escalate to DriftEvent rather than silent demotion; per-signal score decomposition is preserved in audit records.
  • Curator tests: each allowed action (promote, demote, forget, edit, rollback, compact); every disallowed action is refused (auth changes, guardrail changes, user-data access); audit record written for every mutation with actor, trigger, before/after version; curator-pause blocks all Curator mutations.
  • Versioning tests: artifact mutation copies prior version to apollo_artifact_history before overwrite; rollback restores the target version and creates a new version whose prev_version_id points at the post-rollback state; rollback event itself appears in audit; graph snapshots restore correctly; structural graph mutations by admin are audited.
  • Admin chat tests: role gating (admin only); each chat tool executes correctly; audit log shows actor: "admin:<username>"; indefinite: true flag works for critical actions.
  • Layer 1 schema tests: best-effort mode accepts traces without intent_schema and produces inferred nodes; required mode (APOLLO_REQUIRE_INTENT_SCHEMA=true) rejects schema-less traces with 400; intent_schema_coverage stat reports correct rolling percentage.
  • LLM swap tests: provider swap via env (APOLLO_LLM_PROVIDER / APOLLO_LLM_MODEL) takes effect without code changes; MiniMax-via-OpenAI-compatible endpoint is exercised; no MiniMax-specific strings leak into prompt or response parsers.
  • Injection channel tests: oracle attaches apollo_guidance to every /chat response body when an applicable artifact set exists for the caller's L1 scope; oracle attaches apollo_guidance to every MCP dispatch bound for an agent-kind L3 service; dispatches bound for library-kind services do not carry apollo_guidance; attached payload contains as_of, artifacts, and rationale_summary (no injection_id, trigger, or evidence_ref — those are audit-only); attach-timeout overshoot (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS) causes omission of apollo_guidance with an apollo_guidance_attach_timeout_total increment, not a request failure; Curator pause freezes the attached state (subscribers keep receiving the as-of-pause payload); L1 and L3 make no calls to Apollo endpoints in any test fixture — attempts from non-admin tokens to admin preview endpoints return 403.
  • component_kind tests: ServiceRegistry records carry a component_kind field (agent | library); oracle attaches apollo_guidance only to MCP dispatches bound for agent-kind services; library-kind services emit observations but receive no apollo_guidance in their dispatches; Evaluator signal 2 (schema_mismatch) fires only for agent-emitted tool_output with an L1 intent schema on the trace, and is skipped for library-emitted events; re-registering a service with a changed component_kind takes effect on the next dispatch without Apollo redeploy.
  • ApolloGuidanceCache SDK tests: cache.update(payload) replaces the full artifact set idempotently; each canonical accessor (get_system_prompt_additions, get_spec_fragments, get_tool_description_overrides, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints) returns correctly-ordered (weight desc, recency desc) results; applicability filtering narrows by intent context; empty-cache fallback returns empty lists / None without blocking; the SDK holds no transport, no HTTP client, no auth state — it is a pure in-process data structure.
  • Rationale + evidence tests: every attached apollo_guidance.artifacts[*] entry carries a non-empty rationale; every apollo_audit record carries a non-empty rationale and evidence_ref; LLM-driven actions produce synthesized rationales, deterministic Evaluator-driven actions produce templated rationales composed from score_decomposition; rationale and admin_note are distinct and both queryable; admin chat explain_decision(trace_id | artifact_id | audit_id) retrieves the stored rationale and resolves evidence_ref pointers to their underlying observations / graph snapshot / score decomposition; discuss_decision(artifact_id | audit_id) opens a chat thread with the rationale pre-loaded and permits inline action tool calls that themselves are audited with actor: "admin:<username>".
  • Trace propagation tests: L1-minted traceparent arrives unchanged at oracle; oracle forwards the header unchanged on downstream MCP and REST dispatches via axonis_core.gateway.client.extract_http_headers(); MCP context field carries traceparent end-to-end; ApolloClient stamps both the header and envelope trace_id on every POST /observations; oracle mints a replacement and logs missing_traceparent when the header is absent (best-effort); oracle returns 400 when APOLLO_REQUIRE_TRACEPARENT=true and the header is absent or malformed; envelope trace_id wins when it differs from the header; a full lineage query returns every event for a single trace_id across all emitting layers.
  • Integration tests: full lineage from Layer 1 intent_schema + user_prompt through oracle llm_turn and Layer 3 tool_output / tool_error to final_response, with observations captured at every boundary and artifacts produced by synthesis reflecting the lineage.

Depends on: component.oracle.gateway, platform.axonis-core, platform.service-contract