Skip to content

Apollo — System-Wide Observation, Learning, and Guidance Layer

Status: Design Package: oracle.apollo (lives inside oracle, not exposed as a separate service) Depends on: platform.axonis-core, platform.service-contract, component.oracle.gateway Milestone: P3 (after oracle is operational in production)

Purpose

Apollo is the reasoning and memory layer over the platform's LLM activity. It sits inside oracle, observes every LLM and tool interaction across a three-layer system, distills durable artifacts from those observations, and exposes learned guidance back to the layers that need it.

Apollo is an observer, learner, and advisor. It does not execute workflows, does not call tools, does not retry failed requests, and does not interrupt live LLM calls. Iteration is driven by layer 1 (the front-end prompt/schema generator); Apollo's role is to make each successive iteration better informed than the last.

Apollo has its own LLM, its own memory, and an autonomous curator that maintains its own artifacts — but empowerment is strictly bounded to Apollo's internal state. Apollo cannot change auth, guardrails, token scopes, or user data.

Three-Layer Context

┌──────────────────────────────────────────────────────────────┐
│  Layer 1: Front-end                                          │
│    - generates prompts + schemas for requests                │
│    - consumes Apollo's guidance to shape future prompts      │
│    - decides when to re-run a workflow                       │
└────────────────────┬─────────────────────────────────────────┘
                     │  intent, prompt, schema
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 2: Oracle / Apollo                                    │
│    Oracle: auth, routing, LLM dispatch, tool aggregation     │
│    Apollo: observe, learn, advise, curate                    │
└────────────────────┬─────────────────────────────────────────┘
                     │  tool calls, sub-LLM calls
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 3: Backend agents + libraries                         │
│    - agents (parallax, cortex, ...) — LLM-driven             │
│    - libraries (UDS, uds.*, ...) — operational, no LLM       │
│    - execute domain logic, return outputs                    │
│    - oracle observes the MCP round-trip and emits on their   │
│      behalf (L3 never addresses Apollo directly)             │
│    - agents consume injected guidance; libraries do not      │
└──────────────────────────────────────────────────────────────┘

Apollo observes the full lineage of each request: intent (layer 1) → routing and LLM reasoning (layer 2) → execution and outcome (layer 3) → final response back to layer 1.

Apollo's reasoning output reaches the LLMs in L1 and L3 by attaching guidance to the existing request/response flow — symmetric piggybacking in both directions:

  • L1: oracle attaches current applicable guidance to every /chat response body.
  • L3: oracle attaches current applicable guidance to every outbound MCP tool dispatch bound for an agent-kind service.

Both paths ride the ambient auth of the envelope they are embedded in, so no service-token infrastructure is needed and no long-lived connection is maintained. Guidance is always fresh (computed at request time, inside oracle, with Apollo as an in-process call). See §Injection Channel.

Communication with L1 and L3 is unidirectional: oracle attaches Apollo's guidance to outbound envelopes (responses to L1, MCP dispatches to L3); L1 and L3 never query Apollo for guidance and never emit observations to Apollo directly. Oracle is Apollo's sole emitter — it extracts L1 signals from /chat request bodies and it observes L3 outputs by watching the MCP round-trip, calling oracle.apollo.observer.ingest in-process on both layers' behalf (§Invariants 14, §Ingest Semantics). Oracle is also a guidance subscriber for its own chat LLM (L2): the tool-executor at oracle/server/llm/tool_executor.py consults a process-local ApolloGuidanceCache on each turn — no transport involved (§L2 path). Admin tooling is the only exception — admins use GET /api/v1/apollo/guidance and related endpoints for inspection, and may POST to /api/v1/apollo/observations for replay/seed.

Apollo does not observe L1's or L3's internal LLM turns. llm_turn events are emitted by oracle (L2) only. Apollo learns about L1, L2, and L3 LLMs indirectly — from L1/L3 outputs (intent_schema / user_prompt for L1; tool_output / tool_error for L3), from oracle's own llm_turn and final_response (L2), and from outcome correlation — and improves them prospectively by injecting updated guidance into their prompt context.

Trace Propagation

Apollo relies on a single trace_id shared across every observation emitted for one end-to-end request. Trace propagation follows the W3C Trace Context standard (traceparent header) end-to-end across L1, L2, and L3.

This is the concrete realization of the OpenTelemetry aspiration noted in component.oracle.gateway (Oracle) and introduces no conflict with axonis-core: neither axonis-core nor platform.axonis-core/02/03 currently defines a trace, request-id, or correlation-id header. The only header propagation today is Authorization via axonis_core.gateway.client.extract_http_headers() (platform.axonis-core). Apollo's adoption is additive.

Header

traceparent: 00-<trace-id 32 hex>-<parent-span-id 16 hex>-<flags 2 hex>

Format: W3C Trace Context Level 1. Apollo uses only the trace-id segment for lineage stitching; parent-span-id and flags are preserved for standards compliance and future OpenTelemetry interop but are not interpreted by Apollo's lineage layer.

Who mints

  • L1 mints the root traceparent on every new request and sets it on the outbound HTTP call to oracle /chat (and equivalent endpoints). L1 does not call Apollo directly (§Invariants 14); oracle re-emits L1-origin observations in-process, reusing the same trace-id.
  • If oracle receives a request without a traceparent header (e.g., a pre-W3C client), oracle mints one, logs a missing_traceparent telemetry event, and surfaces the minted trace-id in the response so callers can correlate if they choose.

How it travels

Hop Carrier
L1 → L2 (HTTP to oracle) traceparent request header
L2 → L3 (MCP tool dispatch) traceparent HTTP header on the POST to the service's MCP endpoint (same transport as the existing Authorization forward)
L2 → L3 (HTTP fallback, non-MCP) traceparent request header
L3 → L2 (MCP response → oracle) traceparent is preserved by oracle's MCP client across the round-trip; oracle stamps the same trace_id on the tool_output / tool_error envelope it emits in-process
Admin seed → Apollo (POST /observations) traceparent request header and trace_id field in the envelope
Out-of-process emitter → Apollo (secondary) traceparent request header and trace_id field in the envelope (envelope is authoritative)

Oracle is the only L2 hop and is responsible for forwarding the inbound traceparent unchanged on every downstream call that belongs to the same request. Oracle never re-mints mid-request.

axonis-core integration

Trace header propagation ships as an additive change to axonis-core — it lives with the existing cross-service header plumbing, not in oracle-only code:

  • axonis_core.gateway.client.extract_http_headers() — extended to forward traceparent alongside Authorization. This is the single source of truth for cross-service header propagation and is used by both MCPClient and RestClient.
  • axonis_core.gateway.mcp_client.MCPClient — reads traceparent from the inbound request context and sets it as an HTTP header on outbound MCP POSTs, alongside the existing Authorization forward.
  • axonis_core.gateway.rest_client.RestClient — reads traceparent from the inbound context and sets it as an HTTP header on outbound REST calls.
  • ApolloClient (component.oracle.apollo §Ingest Semantics, in axonis-core) — used by admin replay and any future out-of-process emitter; reads the ambient traceparent from request context and sets it as an HTTP header on every POST /api/v1/apollo/observations call and into the envelope's trace_id field (the envelope wins on conflict — see §Envelope mapping). Phase-1 emitters do not use this client; oracle emits in-process and carries trace_id on the envelope it builds directly.

No new dependency is added to axonis-core — parsing the 4-segment traceparent string is a handful of lines; no OpenTelemetry SDK is required. A future OpenTelemetry integration can consume the same header without change.

Envelope mapping

Apollo's observation envelope fields map to W3C Trace Context as follows:

Envelope field W3C source Purpose
trace_id traceparent.trace-id (32-hex) Shared by all events for one end-to-end request
parent_trace_id not derived from traceparent Set by emitter only when this trace is a sub-request spawned from a separate enclosing trace (e.g., a scheduled background workflow). Null otherwise.

parent_trace_id is not the same as W3C parent-span-id. Apollo does not track span hierarchy within a single trace — its per-event observation cadence (§Observation Model → Observation cadence) makes span-level granularity unnecessary. parent_trace_id is used only for cross-trace fork linkage.

Configuration

  • APOLLO_TRACE_HEADER — header name. Default traceparent (W3C). Configurable only to ease staged rollout against pre-W3C emitters; always traceparent in production.
  • APOLLO_REQUIRE_TRACEPARENT — when true, oracle rejects inbound requests without a valid traceparent. Default false through Phases 1–2 (oracle mints on absence). Flip to true in Phase 3 alongside APOLLO_REQUIRE_INTENT_SCHEMA once emitter coverage is proven.

Failure posture

  • Missing header (best-effort): oracle mints, logs missing_traceparent, serves the request. Lineage still stitches because the minted trace-id flows downstream and is used by oracle's own observations.
  • Missing header (required mode): oracle rejects with 400; emitter must include traceparent.
  • Malformed header: oracle rejects with 400 in required mode; logs malformed_traceparent and mints a replacement in best-effort mode.
  • Envelope trace_id differs from header: the envelope value wins — it is the emitter's authoritative signal. Oracle logs the discrepancy for diagnostics.

Package Structure

oracle/
  apollo/
    __init__.py
    observer/
      __init__.py
      ingest.py                  # observation normalization + routing into memory
      events.py                  # event type definitions (Pydantic models)
    memory/
      __init__.py
      store.py                   # wraps axonis-core Memory UDS + ElasticQuery
    learner/
      __init__.py
      synthesis.py               # event-driven LLM synthesis dispatcher (primary driver)
      graphs.py                  # Decision Graph set: nodes, edges, weights, mutations (supplemental anchor)
      extractors.py              # observation → decision points (deterministic; feeds graphs)
      snapshots.py               # versioned graph snapshots for past/current temporal analysis
      trajectory.py              # projection of future decisions from current graph state
      drift.py                   # graph-anchor check on LLM outputs; drift-vs-evolution detection
      prompts.py                 # prompt templates for the synthesis LLM
    guidance/
      __init__.py
      api.py                     # admin inspection endpoints (GET /guidance*)
      attacher.py                # in-process helper oracle calls to attach guidance to responses and MCP dispatches
      selectors.py               # intent → artifacts matching logic
    curator/
      __init__.py
      actions.py                 # promote / demote / forget / edit / compact
      policy.py                  # bounded-empowerment rules
      audit.py                   # audit log writer (Elastic `apollo_audit` index)
    evaluator/
      __init__.py
      scoring.py                 # grades artifacts by outcome correlation; L3-performance amplification
      signals.py                 # failure-signal detectors (see §Evaluator)
      cascade.py                 # upstream-artifact re-flag on L3-driven score drops
    chat/
      __init__.py
      server.py                  # admin-only conversational interface
      tools.py                   # memory-management tools exposed to Apollo's own LLM
    artifacts.py                 # typed artifact schemas (IntentPattern, FailurePattern, ...)
    llm.py                       # Apollo's own LLM client (separate config)
    settings.py                  # env-driven configuration
  server/
    __main__.py                  # mounts /api/v1/apollo/* routes from oracle.apollo.guidance + chat

Apollo is a package inside oracle, mounted into oracle's existing Starlette app at /api/v1/apollo/*. It is not a separate service and does not have its own __main__.py. The oracle invariant ("oracle is the only externally exposed service" — component.oracle.gateway §Invariants 1) is preserved.

Observation Model

Event types

Apollo recognizes the following event types, emitted by oracle and backend services:

Event type Emitter Purpose
intent_schema Oracle (from L1 /chat request body) Front-end's generator schema for this request
user_prompt Oracle (from L1 /chat request body) Concrete prompt produced from the intent schema
llm_turn Oracle (layer 2) One LLM request/response cycle inside oracle
tool_output Oracle (from L3 MCP response) Successful tool execution: inputs, outputs, latency
tool_error Oracle (from L3 MCP response) Tool failure: inputs, error message, stack trace, latency
final_response Oracle What was returned to layer 1 at the end of a conversation turn
user_feedback Oracle (from L1 feedback submission) Thumbs up/down, correction, explicit follow-up signal

Emission paths are covered in detail in §Ingest Semantics. In summary: every Phase-1 event (L1-origin, L2-origin, and L3-origin) is emitted by oracle in-process — neither L1 nor L3 addresses Apollo directly (§Invariants 14). POST /api/v1/apollo/observations remains mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.

Observation payload

All observations share a common envelope:

{
  "event_type": "tool_output",
  "trace_id": "trc_abc123",
  "parent_trace_id": "trc_def456",
  "conversation_id": "conv_xyz",
  "service": "parallax",
  "timestamp": "2026-04-17T10:30:00Z",
  "caller_identity": {"username": "...", "roles": [...]},
  "emitted_by": {"token_subject": "...", "token_roles": [...], "context": "http"},
  "payload": { ... event-specific fields ... }
}

caller_identity vs emitted_by. Apollo records two attribution axes per observation:

  • caller_identity — application-asserted. Who the work is attributed to. Set by the emitter (often a service token stamping observations on behalf of an end user — e.g., cortex emits caller_identity.username="alice" because alice's /chat request fanned out to cortex). The handler stamps this from the Bearer token only when the envelope didn't carry one.
  • emitted_by — server-stamped, unforgeable. Who actually pushed the bytes. Always overwritten by Apollo's ingest handler (HTTP path) or in-process emit helper (oracle hosts Apollo). Carries the validated token subject, roles, and a context ("http" or "in_process"). Emitters cannot forge it; the handler ignores any inbound value and stamps from request.state.token_payload.

Audit query: rows where caller_identity.username != emitted_by.token_subject and emitted_by.token_subject is not a known service principal → flag for review. The two-axis model preserves the legitimate cross-attribution pattern (services emitting per-user observations) while making forging detectable.

Observation cadence (locked)

Apollo records one observation per: - Turn boundary (each LLM request/response cycle) - Tool invocation (tool_output or tool_error) - Error - Final response returned to layer 1

Apollo does not record per-token events. Token-level observation is too noisy for the learner and would cause drift in learned artifacts. This is a drift-prevention decision.

Lineage

Every observation carries a trace_id derived from the W3C traceparent header propagated across L1 → L2 → L3 (§Trace Propagation). Related observations (all events from one end-to-end request) share the same trace_id. Cross-trace sub-requests (e.g., scheduled background workflows spawned from a chat turn) use parent_trace_id for hierarchy.

Memory Model

Apollo's memory is two-tiered:

  1. Raw observations — the events listed above, stored in the Elastic apollo_observations index. High volume, time-boxed retention.
  2. Learned artifacts — structured, versioned objects produced by the Learner. Stored in the Elastic apollo_artifacts index. Low volume, long-lived.

Both indices use the Memory(UDS) class from axonis_core.userspace.intelligence as their UDS primitive (platform.axonis-core §Memory Pattern), specialized via subclassing. Apollo does not re-implement the storage surface.

Artifact types

Artifact Description
DecisionGraph A specialized graph of decision points and transitions (see §Decision Graphs)
DecisionTrajectory Smoothed trajectory of a graph's evolution over time
DriftEvent Flagged structural shift in a decision graph requiring review or explanation
IntentPattern Recurring front-end intent → successful tool/service routing and output shape
IntentSchema Known layer 1 generator schemas Apollo has learned to recognize
SchemaDrift Layer 1 started emitting a new or changed schema — flagged for admin review
PromptShape Recurring prompt structure correlated with good/bad outcomes
ToolPairingHint "Tool X is usually followed by tool Y in successful runs"
FailurePattern Known failure mode with diagnostic signature and recommended remediation
ServiceConnectionHint "For intent of class Q, service S gives better results than service S'"
SpecFragment Short, targeted spec snippet relevant to a class of intent
PromptShim System-prompt addition that improves outcomes for a class of intent
CapabilityMap Distilled view of which services can satisfy which intents

Each artifact is a Pydantic model in apollo/artifacts.py backed by a UDS class. Artifacts are versioned — see §Curator.

Index mappings and templates

Every Apollo index is a flat Elasticsearch index (not a data stream, no ILM policy). Mappings are shipped as JSON templates under oracle/apollo/templates/, following the same convention as rest/uds/templates/*_mapping.json:

  • apollo_observations_mapping.json
  • apollo_artifacts_mapping.json
  • apollo_artifact_history_mapping.json
  • apollo_graph_nodes_mapping.json
  • apollo_graph_edges_mapping.json
  • apollo_graph_snapshots_mapping.json
  • apollo_audit_mapping.json

Every mapping includes the standard UDS block (uds.timestamp, uds.username, uds.visibility), create_ts, update_ts, schema_version, and — for time-limited indices — an expires_ts date field (same pattern as rest/uds/templates/memory_mapping.json). Every index follows the Memory(UDS) / Elastic base-class pattern from platform.axonis-core so that CRUD goes through axonis_core.elastic.Elastic.

Retention

Retention is application-managed, not Elastic-ILM-managed. This matches the codebase convention: axonis-core and rest/uds/ do not configure ILM policies, rollovers, or data streams. Each Apollo document that has a bounded lifetime carries an expires_ts field; a periodic maintenance task (see below) runs Elastic.delete_by_query filtering on expires_ts < now() to reclaim space.

Class Index Expiry mechanism Retention
Raw observations apollo_observations expires_ts = create_ts + 30d set on write 30 days
Graph snapshots (hot) apollo_graph_snapshots expires_ts set by coarsening task (see below) Hourly granularity for 7 days
Graph snapshots (warm) apollo_graph_snapshots Daily snapshots retained after coarsening Daily granularity for 30 days
Graph snapshots (cold) apollo_graph_snapshots Weekly snapshots retained after coarsening Weekly granularity for 90 days total
Learned artifacts apollo_artifacts No expires_ts — lifecycle driven by Curator Indefinite; forgotten by admin or Evaluator-demoted N cycles
Artifact history apollo_artifact_history No expires_ts Indefinite (rollback substrate)
Audit log apollo_audit expires_ts = create_ts + 90d or null for indefinite ≥ 90 days (configurable)

Maintenance task. A periodic background job (default hourly, configurable via APOLLO_MAINTENANCE_INTERVAL) performs: 1. delete_by_query on any index where expires_ts < now() 2. Coarsening on apollo_graph_snapshots: hourly rows older than APOLLO_SNAPSHOT_HOURLY_TO_DAILY_AGE_DAYS (default 7) are grouped by (graph_id, calendar date); the most recent row in each group is re-tagged tier="daily" and the rest deleted. Same shape at the daily→weekly boundary: dailies older than APOLLO_SNAPSHOT_DAILY_TO_WEEKLY_AGE_DAYS (default 30) collapse to one weekly row per (graph_id, ISO week). Both windows are env-overridable; see apollo/settings.py for the documented operator profiles, validation rules, and storage trade-offs. 3. Optional Learner-driven compaction of observations near TTL into apollo_artifacts summaries (event-driven: compaction runs on admin-initiated synthesis or guidance miss, not in this maintenance pass).

The maintenance task uses axonis_core.elastic.Elastic.delete_by_query — no Apollo-specific Elastic client.

Retention summary

  • Raw observations: 30 days, then delete_by_query.
  • Graph snapshots: 90 days total, tiered (7d hourly → 30d daily → 90d weekly) via application-level coarsening.
  • Artifacts: indefinite; Curator manages lifecycle; prior versions preserved in apollo_artifact_history forever.
  • Audit log: ≥ 90 days.

Learner

Apollo's Learner is LLM-driven, graph-anchored. Apollo's LLM (see §Apollo's LLM) is the primary engine of synthesis: it processes observations as they arrive (event-driven — see §LLM synthesis below), creates and refines artifacts, classifies intents, diagnoses outcomes, and drives admin chat. The decision graphs are supplemental — they provide deterministic grounding that keeps the LLM anchored and prevents it from drifting.

The relationship is: the LLM reasons flexibly; the graphs remember rigidly. Every LLM call reads the relevant graph state as grounding context. Every LLM output is checked against the graph's trajectory. The LLM cannot propose a pattern that contradicts what the graphs have deterministically recorded without being flagged as drift.

Decision Graphs

Apollo maintains a series of specialized graphs rather than one monolithic graph. Each graph captures a different decision surface:

Graph Nodes Edges
intent_tool_graph Intent classes, tool identifiers "Intent → tool chosen" with outcome weight
prompt_shape_graph Prompt structure clusters "Shape A evolved into shape B in later iteration"
service_routing_graph Intent classes, backend services "Intent → service picked" with outcome weight
outcome_graph Decision points, outcome classes "Decision → outcome produced" with frequency
iteration_graph States within a layer-1 re-run chain "Iteration N → Iteration N+1 decision delta"

Cross-graph links exist where decisions in one graph point to nodes in another (e.g., a tool-selection node in intent_tool_graph links to the outcome node in outcome_graph).

Node and edge model

Each node carries: - id, graph_id, kind, label - Occurrence count, first-seen / last-seen timestamps - Outcome distribution (aggregated from incoming observations) - Tags for retrieval

Service-namespaced labels

Every label that is naturally service-scoped is prefixed with the emitting service: <envelope.service>/<label>. Concretely, the extractors namespace labels for the following node kinds:

Graph Kind Example label
intent_tool_graph intent, tool cortex/screening, cortex/summarize
prompt_shape_graph prompt_shape oracle/shape:20:a3f1b2c0
service_routing_graph intent parallax/screening
outcome_graph decision_point cortex/tool:summarize, oracle/conversation:conv_42
iteration_graph iteration_state oracle/iter:trc_1234

Two node kinds are intentionally not prefixed:

  • service nodes in service_routing_graph carry the service name itself as their identity (e.g., bare cortex). Prefixing would yield the meaningless label cortex/cortex.
  • outcome nodes in outcome_graph carry universal categorical labels (success, error, feedback_up, feedback_down, feedback_abandoned). The per-service split is carried by the decision_point side of the edge, not by fragmenting the outcome taxonomy.

This rule means two backend services that register a tool with the same name (e.g. cortex/summarize and parallax/summarize) form distinct nodes and accumulate counts, EWMA weights, and outcome distributions independently. Downstream synthesis (M8), drift detection (M12), and evaluator scoring (M10) therefore operate on per-service signal rather than a cross-service average.

Each edge carries: - source_id, target_id - Weight (an outcome-correlation-adjusted transition probability) - Count, first-seen / last-seen - Recent-window weight (exponentially-weighted moving average over a short horizon) - Long-window weight (EWMA over a long horizon)

The divergence between recent-window and long-window weights is the primary drift signal.

Graph updates (per observation, deterministic)

The Learner's extractors run deterministically on every ingested observation:

  1. Extract decision points (e.g., "intent class", "tool called", "service routed", "outcome class") using rules and lightweight matchers.
  2. Upsert nodes: create new if absent, increment count and update last-seen if present.
  3. Upsert edges: create new or reinforce. Update short-window and long-window weights.
  4. Attach the observation's trace_id to the affected nodes/edges for lineage queries.

No LLM call. No new free-form artifacts. Graph mutations only. This path is the grounding layer — it records what has actually happened in the system, with no interpretation.

Snapshots and trajectory

  • Snapshots. Each graph is snapshotted on a cadence (default: hourly; configurable via APOLLO_GRAPH_SNAPSHOT_INTERVAL) into the Elastic apollo_graph_snapshots index. Snapshots are the substrate for past-vs-current comparison.
  • Trajectory. A projection of near-future graph state from current EWMA velocities. Used by Guidance to pre-warm likely-next decisions and by drift detection to establish an expected trajectory.

LLM synthesis (event-driven, primary driver)

Apollo's LLM runs the primary synthesis engine and is event-driven, not scheduled. It is invoked in response to specific observation events — not on a timer, not on a batch threshold. The cadence of synthesis matches the cadence of actual system activity.

Synthesis triggers.

Trigger Inbound event
Layer 1 sends a request intent_schema or user_prompt observation ingested
Layer 3 returns an output tool_output, tool_error, or final_response ingested
Admin chat turn POST /api/v1/apollo/chat request
Admin-initiated synthesis POST /api/v1/apollo/learn request

Other observation types (llm_turn from oracle itself) feed the graphs but do not trigger synthesis on their own — they are intermediate steps between a Layer 1 request and a Layer 3 output. Novel-intent synthesis occurs naturally on the Layer 1 / Layer 3 triggers above; no GET /guidance request from L1 or L3 ever drives synthesis (§Invariants 14).

Inputs on each synthesis call. - The triggering observation (or chat turn) - The relevant subgraph state from each decision graph (grounding context) - Active artifacts that match the observation's intent/tool/service fingerprints - Recent evaluator scores for matched artifacts - Prior synthesis output for the same trace_id, if any (for continuity within a request lineage)

Outputs. - Proposed new artifacts (IntentPattern, FailurePattern, etc.) - Proposed edits to existing artifacts - Proposed promotions/demotions - Drift flags when the LLM itself detects divergence - Compaction proposals for old observations near TTL - For admin chat and admin-initiated triggers: a direct response returned to the caller

All outputs are structured Pydantic models. The Curator commits them only after the graph-anchor drift check (below) clears.

Concurrency. A burst of Layer 3 tool outputs (e.g., a fusion run with many tool calls) can trigger many near-simultaneous synthesis calls. Apollo bounds concurrent synthesis via APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) with a queue of pending triggers. Duplicate triggers within the same trace_id are coalesced: only the latest observation in a lineage is processed.

Graph-anchor drift check

The graphs are the anti-drift mechanism. Every LLM synthesis output is validated against the graphs before the Curator commits it:

  • Proposed pattern vs. recorded edges. If the LLM proposes "tool X is typically followed by tool Y" but the intent_tool_graph edge X→Y has low weight or is absent, the proposal is flagged.
  • Proposed intent classification vs. node clusters. If the LLM introduces an intent class that does not correspond to any node cluster in the graphs, flagged.
  • Weight swings. If the LLM's proposal would effectively invert a strongly-weighted edge, flagged — even if the LLM's reasoning is plausible, this is exactly the shape of drift.
  • Trajectory coherence. If the LLM's proposed trajectory diverges from the graph's EWMA projection, flagged.

Flagged outputs produce a DriftEvent artifact. The Curator does not commit a flagged proposal autonomously — admin review is required via chat or the audit surface. This is how the graphs protect the LLM from itself.

Drift vs. evolution

The graph-anchor check distinguishes:

  • Evolution — LLM synthesis outputs consistent with graph trajectory; graph weights shift smoothly as observations accumulate. Proposals are committed autonomously by the Curator.
  • Drift — LLM synthesis outputs diverge from graph state; sudden edge-weight swings; emergent nodes appearing faster than configured rate caps. Proposals are held for admin review.

Thresholds are per-graph and configurable (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance on LLM outputs).

Storage

  • Graph nodes and edges live in the Elastic apollo_graph_nodes and apollo_graph_edges indices (UDS-backed, per platform.axonis-core invariant 2).
  • A working in-memory mirror of the active graphs is maintained for hot-path reads (guidance, drift detection). The in-memory mirror is derived state; it is always rebuildable from Elastic.
  • Snapshots live in apollo_graph_snapshots. Snapshots are immutable after write.

Guidance API (admin inspection only)

Apollo delivers guidance to L1 and L3 LLMs exclusively via the response-attached Injection Channel (see §Injection Channel). L1 and L3 do not pull guidance — there are no GET calls from those layers in the runtime path.

The GET /guidance* endpoints below are retained as admin inspection tools only: admins and admin chat tooling use them to preview what Apollo would currently inject for a given intent, layer, or subscriber. They are gated to role admin via oracle's guardrails (component.oracle.gateway §Guardrails).

L3 operational libraries (no LLM) receive no guidance — they emit observations and are otherwise opaque to Apollo.

Endpoints

GET  /api/v1/apollo/guidance?intent=<query>&layer=1|3
GET  /api/v1/apollo/guidance/schemas
GET  /api/v1/apollo/guidance/tools
GET  /api/v1/apollo/guidance/specs
GET  /api/v1/apollo/guidance/connections

The top-level /guidance endpoint accepts an intent description (free text or structured) and the consuming layer, and returns a ranked set of applicable artifacts — previewing what Apollo would currently inject. The sub-paths return filtered artifact views by type.

All endpoints require the admin role. L1 and L3 never call them (§Invariants 14).

Example response

{
  "intent_match": {"pattern_id": "ipat_abc", "score": 0.88},
  "schemas": [...],
  "tools": [
    {"name": "fusion_run_start", "description_override": "...", "routing_hint": "parallax"}
  ],
  "specs": [
    {"id": "spec_frag_123", "content": "For federate alignment, ensure lens binding..."}
  ],
  "connections": [
    {"from": "layer1.screening_intent", "to": "parallax.fusion", "confidence": 0.91}
  ]
}

Workflow Generation Hints

Oracle's gateway owns the natural-language→workflow-graph orchestration contract (component.oracle.gateway §workflow-generation). Apollo is the enhanced-generation and guidance half: it shapes what gets generated and surfaces quality hints about a workflow, using its observation and learning layers. Apollo never executes operations and never authors the workflow itself — it injects guidance into the generation path and annotates workflows with advisory hints.

Generation Guidance

When the gateway drives a workflow-generation request, Apollo contributes guidance through the same response-attached Injection Channel it uses for all L1/L3 guidance (§Injection Channel) — it is not a separate pull path.

  • #REQ.workflow-gen-guidance — for a workflow-generation intent, Apollo may attach matched guidance artifacts (intent→operation patterns, tool routing hints, learned successful-workflow exemplars) to the generation request, raising the likelihood that the produced node graph is valid and idiomatic. This is advisory: the gateway's generation contract (component.oracle.gateway §workflow-generation.contract) remains the source of truth for request/response shape.
  • #REQ.workflow-gen-learning — Apollo observes generated-workflow outcomes (accepted, edited, discarded, execution success/failure) as observations and feeds them to the learner, so generation guidance improves over time. Apollo does not persist per-call generation state beyond its standard observation model.

Workflow Quality Hints

Apollo analyses a (generated or user-built) modelling workflow and emits advisory hints that a frontend can attach to the workflow — flagging issues without blocking the user.

  • #REQ.workflow-hints — Apollo's workflow analysis produces hints in three categories: missing best practice, data quality issue, and modeling issue. Each hint is advisory (non-blocking), carries the node/edge it applies to, and a human-readable rationale.
  • #REQ.workflow-hints-scope — Apollo's hint scope is the modelling-workflow layer (operation ordering, modelling-step soundness, best-practice gaps). Dataset-level quality analysis routines — computing dataset-quality metrics themselves — are owned elsewhere on the ML surface, not by Apollo; Apollo consumes their signals to phrase a hint but does not implement the dataset-analysis routines.

Injection Channel

Apollo delivers guidance to L1 and L3 LLMs by attaching it to the existing request/response flow — symmetric piggybacking in both directions. There is no separate push transport, no long-lived connection, no service token, no SSE client in production. Guidance is computed at request time (in-process inside oracle, since Apollo lives there) and embedded in the envelope that was already travelling.

  • L1 path: oracle attaches current applicable guidance to every /chat response body.
  • L3 path: oracle attaches current applicable guidance to every outbound MCP tool dispatch.

Both paths are fresh-per-call by construction — there is no cache to go stale, no reconnect to replay, no disconnected subscriber to reconcile. Apollo lives inside oracle, so fetching guidance for an outbound envelope is an in-process Python call, not a network hop.

Guidance communication is unidirectional: Apollo → L1, Apollo → L3. Subscribers never POST guidance back (observation ingest is a separate path — §Ingest Semantics). Captured as §Invariants 14.

Why response-attached instead of a push channel

L1 is only doing LLM work when composing a response to the user's latest message — the act of calling /chat is what triggers that work. Any guidance change Apollo makes while L1 is idle has nothing to apply to until the next /chat, at which point the response can carry the freshest state. A separate push channel for idle L1 therefore provides no observable benefit and introduces a long-lived auth session to maintain.

L3 agents only exist inside a user-request context (oracle dispatches to them; they validate the forwarded user token). There is no service-token mechanism in axonis-core today (see §Authentication & Authorization). A long-lived L3 connection would require inventing one. Attaching guidance to the MCP dispatch uses the existing user-token-forwarding pattern and delivers guidance exactly when the agent needs it.

L1 path: attached to /chat responses

POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop (oracle/server/llm/tool_executor.py, 5 providers: anthropic / openai / groq / ollama / trinity). It is distinct from Apollo's admin chat at POST /api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for talking to Apollo's synthesis brain.

When oracle responds to a POST /chat, it calls Apollo's in-process apollo.guidance.for_l1(user=..., intent_context=...) before serializing, and embeds the result on the response envelope under apollo_guidance. Beacon-style L1 clients consume that field via their local ApolloGuidanceCache.update(...).

Model extension. Oracle's existing ChatResponse Pydantic model (oracle/server/api/routes.py) must be extended with an optional field:

class ChatResponse(BaseModel):
    response: str
    conversation_id: str
    tool_calls: list = Field(default_factory=list)
    model_used: str = ""
    tokens: dict = Field(default_factory=dict)
    apollo_guidance: dict | None = Field(default=None)   # added by component.oracle.apollo

The field defaults to None, so pre-Apollo clients and responses where guidance is omitted (attach-timeout, Apollo unavailable, empty applicable set) serialize identically to today. Clients that don't know about the field simply ignore it.

Envelope shape when guidance is present:

{
  "response": "...assistant reply...",
  "conversation_id": "...",
  "tool_calls": [...],
  "model_used": "...",
  "tokens": {...},
  "apollo_guidance": {
    "as_of": "2026-04-17T10:30:00Z",
    "artifacts": [
      {
        "id": "pshim_xyz",
        "type": "PromptShim",
        "version": 7,
        "content": { ... },
        "applicability": { "intent_class": "...", "tags": [...] },
        "rationale": "Human-readable explanation of why this artifact is active now."
      }
    ],
    "rationale_summary": "3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
  }
}

L1 receives the response, hands apollo_guidance.artifacts to its local ApolloGuidanceCache, and renders the assistant message. The payload is the complete applicable set for this user's L1 scope — not a diff — so cache replacement is strictly idempotent.

On the next user turn, L1 uses the freshly-populated cache to compose its prompt. Guidance staleness is bounded by a single turn.

L2 path: in-process cache for oracle's own chat LLM

Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Oracle is therefore also a guidance subscriber for its own LLM — distinct from L1 (beacon's LLM) and L3 (cortex/parallax's LLMs).

Because oracle hosts Apollo, no transport is needed. Oracle owns a process-local ApolloGuidanceCache populated directly from apollo.guidance.for_l2(...) (analogous to for_l1 and for_l3_agent) before each LLM turn. The tool-executor consults the cache via the canonical accessors (get_system_prompt_additions, get_spec_fragments, get_active_failure_patterns, get_tool_pairing_hints, get_tool_description_overrides, get_service_connection_hints) on every turn and folds the results into its system prompt and tool-catalog rendering, exactly as L1 and L3 subscribers do.

The L2 path is symmetric with L1/L3 in artifact applicability filtering (scope=l2 on the attacher), in the timeout budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS), and in the failure posture (cache miss / timeout → tool-executor proceeds with no guidance, request still succeeds). It differs in transport only: no JSON serialisation, no envelope traversal — a direct in-process call.

L3 path: attached to MCP tool dispatches

When oracle dispatches a tool call to an L3 agent (component_kind == "agent"), oracle attaches Apollo's currently-applicable guidance inside the tool's arguments dict under the apollo_guidance key — mirroring the existing pattern oracle uses to inject llm_spec into arguments (oracle/server/mcp/server.py). This keeps the JSON-RPC envelope shape unchanged (params stays {name, arguments}) and requires no MCP handler changes on agent-side beyond the agent extracting and applying the new argument:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "fusion_run_start",
    "arguments": {
      "...tool-specific args...": "...",
      "apollo_guidance": {
        "as_of": "2026-04-17T10:30:00Z",
        "artifacts": [ ... ],
        "rationale_summary": "..."
      }
    }
  }
}

L3 agent-side MCP handlers extract apollo_guidance from arguments (the same way they currently extract llm_spec), hand it to their local ApolloGuidanceCache for the duration of this request's LLM turns, and strip it before passing the remaining arguments to the tool's business logic. Because L3 only acts inside a user-request context, cache lifetime naturally scopes to the request — no background state, no long-lived connection, no service-token novelty.

L3 libraries (component_kind == "library") do not receive apollo_guidance in their dispatches — oracle filters them out before serialization. Libraries have no LLM to improve (§Invariants 15).

Payload shape

apollo_guidance carries:

  • as_of — timestamp of the artifact snapshot. Used for traceability and admin debugging.
  • artifacts — the currently-applicable artifact set for the subscriber's scope. Each artifact has id, type, version, content, applicability, and rationale (see §Rationale and evidence).
  • rationale_summary — structured one-liner naming the attached artifact IDs per type, plus a +N capped (...) tail for artifacts the per-type cap held back. See §Prioritization Layers → Layer 5 for the exact format and the parallel aggregate_artifact_stats query for per-artifact stats.

There is no injection_id, no reason/trigger enum, no subscriber_scope, no evidence_ref on the per-call payload. That metadata lives in the audit log (§Audit log) — attaching it to every response/dispatch would balloon payload size with data that matters to admins, not to LLMs.

Freshness and ordering

Guidance is always at most one turn stale from each subscriber's perspective:

  • L1's next /chat call sees the freshest guidance. Between turns, L1's cache reflects the guidance as-of the most recent response.
  • L3's MCP dispatch carries guidance computed at the instant oracle is about to call. By construction the agent sees guidance current at dispatch time.

Because the cache is overwritten on every inbound response/dispatch, there is no "subscriber drift" problem to solve — the cache cannot diverge from Apollo.

Triggers (synthesis unchanged)

Apollo's Curator still commits artifact mutations event-driven (§Learner → LLM synthesis). The commits no longer trigger separate push events — they simply become the state that the next attached apollo_guidance payload reflects. Pause/resume of the Curator is therefore also a passive effect: paused Curator → artifact set stops changing → subscribers keep receiving the same state on subsequent calls.

Failure posture

  • Apollo slow: oracle's guidance-fetch call has a strict in-process budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS, default 10 ms). On overshoot, oracle serializes the response/dispatch without apollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. No user-visible failure.
  • Apollo has no applicable guidance: apollo_guidance is omitted (or null). Subscribers proceed without guidance. Normal state during Phase 1.
  • Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation, demotes; the next attached payload reflects the demotion. Admin can force rollback at any time.
  • No network partition risk: Apollo is in-process with oracle. There is no network path between them that can fail.
  • Curator paused: attached payloads continue to reflect the state as-of the pause. Subscribers see frozen guidance until resume. Because every response/dispatch still carries the current set, subscribers never lose their guidance due to the pause — it just stops changing.

Rationale and evidence

Each artifact in the attached payload carries a rationale string (LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions). This is the same rationale written into apollo_audit (§Audit log). Subscribers may log it when applying the artifact to a prompt; admins query it via audit log or admin chat (§Admin Chat).

The fuller evidence_ref (pointers to observations, graph snapshot id, score decomposition, related drift events) is not carried in the per-call payload — it lives in apollo_audit. Admins retrieve it via explain_decision / discuss_decision in admin chat, which resolves the audit record.

Audit

Every Curator action writes an apollo_audit record with action, actor, trigger, rationale, and evidence_ref (§Audit log). Individual deliveries — attached payloads on responses and dispatches — are not audited. Delivery would produce one record per user turn per layer, far too noisy to be useful. The audit captures decisions; deliveries are implementation detail.

Subscriber SDK: ApolloGuidanceCache (pure local cache)

ApolloGuidanceCache in axonis-core is a pure in-memory cache with no transport. It has two surfaces:

Update (called by the subscriber's request handler):

  • cache.update(apollo_guidance_block) — replaces the cache's artifact set with the payload. Idempotent; the payload is the complete applicable set, not a diff.

Canonical accessors (consumed by the subscriber's LLM-turn codepath):

Method Returns Used at
get_system_prompt_additions(intent_context) Ordered list of PromptShim bodies System-prompt construction
get_spec_fragments(intent_context) List of SpecFragment RAG-like context insertion
get_tool_description_overrides(tool_name) Override dict or None Tool-catalog rendering
get_tool_pairing_hints(current_tool) List of ToolPairingHint After-tool-call reasoning
get_active_failure_patterns(intent_context) List of FailurePattern with diagnostic hints Pre-call guard; post-call error interpretation
get_service_connection_hints(intent_context) List of ServiceConnectionHint Service routing

Applicability filtering happens inside the cache: each artifact's applicability block is matched against the caller's intent_context. When multiple artifacts of the same type match, the SDK returns them ordered by (weight desc, recency desc); merge policy past ordering is the agent's choice.

No HTTP client, no long-lived connection, no authentication inside the SDK — the cache is a data structure inside the subscriber's process. platform.axonis-core invariant 1 (axonis-core has no ML dependencies) is preserved; ApolloGuidanceCache is pure Python data structures.

Empty-cache fallback: if no apollo_guidance has yet been delivered to this subscriber (first call, Apollo off, APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS overshoot), all accessors return empty lists / None. The subscriber proceeds without guidance. This is the safe default pre-Apollo behavior.

Admin inspection

Admins can preview what Apollo would attach on the next request:

Method Path Purpose
GET /api/v1/apollo/guidance?scope=l1 Preview current L1-scoped artifact set
GET /api/v1/apollo/guidance?scope=l3:<service_name> Preview current L3-scoped artifact set for a given agent
GET /api/v1/apollo/guidance/stream?scope=<scope> Admin-only SSE feed of Curator commits in real time (debugging)

The SSE feed is a debugging aid only — production delivery never uses it. All admin inspection endpoints require the admin role.

Prioritization Layers

The attacher's job is not only to find applicable artifacts but to choose which subset reaches the receiver's LLM. A naive "match everything, send everything" strategy is wrong on two axes: it bloats the receiver's prompt budget once the artifact set grows, and it makes operator-promoted "preferred" artifacts indistinguishable from low-value ones. Apollo's prioritization story is implemented as seven cooperating layers; together they make selection observable, quality-aware, and bounded.

The layers are ordered by data dependency — earlier layers don't depend on later ones, and each layer's surface stays useful even if the layers above it are disabled.

Layer 1 — Capped-artifact observability

When the per-type attach cap drops an artifact (see Layer 2 for the cap mechanism itself), each held-back artifact gets a row in apollo_lineage_events with kind: "capped", the artifact's artifact_type, the call's scope and trace_id. Two query paths read these rows:

  • query_capped_for_artifact(artifact_id, *, service_name=None, limit=500) — list traces where this artifact was capped.
  • aggregate_artifact_stats(artifact_id, *, since=None, limit=1000){attached_count, capped_count, last_attached_at, last_capped_at}.

Both surfaces are exposed on the admin REST API as GET /lineage/capped and GET /artifacts/{artifact_id}/stats. The same lineage rows are also available to the evaluator for "matched-but-shadowed" diagnostics.

Invariant. query_traces_with_artifact and query_trace_attribution filter kind: "capped" out by default — the "applied" semantics of /lineage is unchanged.

Layer 2 — Selection sort key

apollo.guidance.attacher._sort_key orders matched artifacts before the cap fires. Each tier has a default that preserves the previous tier's behavior, so the chain stays well-defined even with sparse data:

Tier Source Default when missing
1 content.evaluator_score 1.0 (innocent until signaled)
2 content.confidence 0.0 (no opinion stated)
3 applicability specificity (count of populated narrowing fields) 0
4 content.weight 1.0
5 as_of ""

evaluator_score defaults to 1.0 to match ArtifactScore.score's baseline (a never-signaled artifact is treated as innocent). confidence defaults to 0.0 because synthesis confidence is an opt-in endorsement — absence means "no opinion." Specificity activates today and is the practical lever when the upper tiers tie; tiers 1 and 2 become load-bearing once their sources flow (see Layers 4-A and 4-B).

Per-type caps live in config:

APOLLO_ATTACH_CAP_PROMPT_SHIM=10
APOLLO_ATTACH_CAP_SPEC_FRAGMENT=5
APOLLO_ATTACH_CAP_TOOL_PAIRING_HINT=5
APOLLO_ATTACH_CAP_FAILURE_PATTERN=10
APOLLO_ATTACH_CAP_SERVICE_CONNECTION_HINT=5
APOLLO_ATTACH_CAP_INTENT_PATTERN=5

ApolloGuidanceCache._sorted on the receiver side uses an identical priority key so the order the sender selected is preserved through to the LLM.

Layer 3 — Signal preservation at promote

The promote action's content-extraction helper (_content_from_proposal) strips proposal metadata before storing on the artifact. The three ranking signals (evaluator_score, confidence, weight) must not be added to the metadata strip-list. The constants _METADATA_KEYS and _RANKING_SIGNALS in apollo/curator/actions.py make this contract explicit; TestRankingSignalContract enforces it.

Invariant. If a proposal carries evaluator_score, confidence, or weight at the top level, the promoted artifact's content must carry them too.

Layer 4-A — Evaluator score writeback

apollo/evaluator/persist.py:persist_score_to_artifact writes content.evaluator_score and content.score_decomposition to the artifact document after every signal application in the ingest worker. Uses a Painless script to preserve the type-specific content fields (text, signature, etc.).

Properties: - Fire-and-forget. Never blocks the ingest hot path. - Idempotent. retry_on_conflict=3 handles concurrent worker writes. - Kill-switch. APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false disables persistence without touching the in-memory engine (audit + cascade paths still work). - Graceful degradation. Failures are logged and counted (apollo_evaluator_score_persist_failed_total); the in-memory engine remains authoritative.

Layer 4-B — Synthesis confidence

Every synthesis prompt (build_failure_pattern_prompt, build_intent_pattern_prompt, build_prompt_shim_prompt, build_sweep_prompt) requires the LLM to emit a top-level confidence: 0.0..1.0. The _SHARED_RULES block explains the semantics — reserve high confidence for patterns the model would stake its reputation on, because Apollo uses it to rank artifacts at attach time.

apollo/learner/synthesis.py:_normalize_confidence is called from _record_proposal and: - Clamps values to [0.0, 1.0]. - Coerces missing or unparseable inputs to _NEUTRAL_CONFIDENCE = 0.5 so a malformed LLM response doesn't unfairly downrank an otherwise-valid proposal.

The normalized value rides on the proposal through promote (via the Layer 3 contract) onto artifact.content.confidence, where Layer 2's sort consumes it.

Layer 5 — Deepened rationale_summary + per-artifact aggregation

apollo.guidance.attacher._summarize emits a structured summary that names attached and capped artifact IDs per type:

"3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"

ID lists truncate to _SUMMARY_ID_PREVIEW = 5 with a +N tail. Empty input still produces "". Types are sorted alphabetically so summaries diff cleanly across calls.

aggregate_artifact_stats (Layer 1) is the symmetric on-demand summary keyed by artifact rather than by attach call.

Layer 6-A — Artifact embedding at promote

apollo/learner/similarity.py:compute_embedding reuses axonis.memory.embedder.embed (sentence-transformers, gated by the [memory] extra). The vector is stored on content.embedding_vector. Type-aware text extraction handles each artifact type's content shape (PromptShim text, FailurePattern signature+remediation, etc.).

Graceful degradation. When sentence-transformers is unavailable, compute_embedding returns None; the promote still succeeds with no embedding stored. Downstream similarity checks (6-B, 6-C) skip artifacts without embeddings.

Layer 6-B — Promote-time similarity advisory

After the embedding is computed, the promote handler scans active artifacts at the same (type, service_name, tool_name) scope and surfaces matches above the cosine threshold in ActionResult.similar_artifacts. Default threshold: APOLLO_SIMILARITY_THRESHOLD=0.9.

The advisory is informational only — promote still succeeds. Admin chooses whether to demote + supersede the prior(s) by re-promoting with supersede: true and the prior's IDs.

Layer 6-C — Curator-time similarity sweep

apollo/learner/coalescer.py:run_periodic is a fifth background loop alongside snapshot, curator-auto, maintenance, and synthesis-sweep. Each tick:

  1. Loads all status=active artifacts.
  2. Partitions by (type, service_name, tool_name).
  3. Within each partition, union-finds clusters where every pairwise cosine ≥ APOLLO_COALESCER_THRESHOLD (default 0.85, slightly looser than 6-B's promote-time threshold).
  4. For each cluster, calls Apollo's LLM via build_coalesce_prompt to write a coherent merger.
  5. Records the merger as a proposal on apollo_proposals with supersedes: [id1, id2, ...] so admin promote demotes the components atomically.

Bounded per sweep: APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN=5 (defensive LLM-cost cap). Off by default (APOLLO_COALESCER_ENABLED=false) — operators opt in once they're ready to budget the LLM calls and review the proposals.

promote() extends the supersede flag's semantics: when the proposal carries supersedes: [...], each listed artifact is demoted alongside the new promote, in the same atomic batch.

Metrics surface

Each layer adds telemetry so operators can see what's happening:

Metric Source layer
apollo_guidance_attach_null_total{scope, reason} observability over the attach path's null returns
apollo_guidance_attach_success_total{scope} counterpart counter for successful attaches
apollo_guidance_attach_payload_bytes{scope} (histogram) size growth — operators alert if it bloats
apollo_guidance_attach_artifact_count{scope} (histogram) distribution of artifacts per attach
apollo_guidance_attach_capped_total{scope, artifact_type} per-type drop counts (Layer 2 → 1)
apollo_evaluator_score_persisted_total / apollo_evaluator_score_persist_failed_total Layer 4-A health
apollo_coalescer_proposals_emitted_total / apollo_coalescer_merge_failed_total Layer 6-C health

A guidance_health block on GET /stats summarizes per-scope success/null breakdown for at-a-glance review.

Disabling layers

Every layer can be turned off independently:

APOLLO_GUIDANCE_ATTACH_ENABLED=false      # disables Layer 2 + everything above
APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false  # Layer 4-A
APOLLO_SIMILARITY_ENABLED=false           # Layer 6-A + 6-B
APOLLO_COALESCER_ENABLED=false            # Layer 6-C (default off)
APOLLO_LINEAGE_PERSIST_ENABLED=false      # Layer 1

When disabled, the layer degrades to no-op; the rest of the system keeps running with the next-best signal.

Curator

The Curator is the only component empowered to mutate Apollo's memory. All mutations are bounded and auditable.

Allowed autonomous actions

  • Promote an artifact (increase its weight in guidance results)
  • Demote an artifact (hide from guidance without deleting)
  • Forget an artifact (delete after it has been demoted for N evaluation cycles)
  • Edit artifact metadata (tags, applicability, version, human-readable notes)
  • Summarize / compact raw observations into a new artifact

Disallowed actions (hard invariants)

  • Change auth or guardrails configuration
  • Widen or narrow a caller's tool access
  • Read or mutate another user's conversation data
  • Mint tokens, escalate privileges, or bypass OAuth
  • Call backend services on behalf of any user
  • Modify or delete audit log records

Versioning

Apollo uses a two-tier versioning model. Artifacts are versioned per-mutation; graphs are captured via snapshots (see §Snapshots and trajectory and §Retention). Both are in place from Phase 1 — versioning is cheap to establish up front and impossible to reconstruct retroactively once Curator empowerment goes live in Phase 3.

Artifacts (IntentPattern, FailurePattern, PromptShim, SpecFragment, ToolPairingHint, ServiceConnectionHint, CapabilityMap, DecisionTrajectory, DriftEvent, IntentSchema, SchemaDrift, PromptShape). Every mutation — autonomous Curator action, admin edit, synthesis-proposed edit, rollback — produces a new version:

  • Current version lives in apollo_artifacts.
  • Every prior version is copied to apollo_artifact_history before the mutation.
  • Each artifact record carries version, prev_version_id, change_reason, actor.
  • apollo_artifact_history has no expires_ts — prior versions are retained indefinitely as the rollback substrate (§Retention).
  • Rollback: POST /api/v1/apollo/artifacts/{id}/rollback with target version or prev_version_id replaces the current record and writes a new version whose prev_version_id points at the post-rollback state (so rollback itself is a versioned event, recorded in audit).

Graphs (DecisionGraph). Per-observation node/edge mutations are too high-frequency to version individually. Graph rollback uses snapshots instead:

  • Hourly snapshots for 7 days, daily for 30 days, weekly for 90 days (per §Retention).
  • Admin rollback on a graph restores from a prior snapshot. Coarser granularity than artifact rollback by design.
  • Structural mutations initiated by admin or Curator on a graph (e.g., manually forgetting a node, merging two nodes) are tracked as audit events in apollo_audit with before/after snapshot IDs.

Audit log

Every Curator action, Evaluator-driven demotion, drift-hold, upstream artifact re-flag, and admin-chat state mutation writes a record to the Elastic apollo_audit index. The index follows the shared axonis Elastic convention (flat index, UDS shell, expires_ts, delete_by_query cleanup — see §Index mappings and templates and §Retention).

Record schema:

{
  "uds": {"timestamp": "...", "username": "...", "visibility": "..."},
  "create_ts": "...",
  "update_ts": "...",
  "schema_version": 1,
  "expires_ts": "...",                             // null if indefinite=true

  "action": "promote" | "demote" | "forget" | "edit" | "rollback" | "compact"
          | "drift_hold" | "upstream_flag" | "pause_curator" | "resume_curator",
  "actor": "curator_auto" | "evaluator_auto" | "admin:<username>",
  "trigger": "evaluator_score_below_threshold"
           | "l3_performance_cascade"
           | "drift_event"
           | "admin_manual"
           | "synthesis_proposal",

  "artifact_id": "...",
  "artifact_type": "FailurePattern",
  "before_version_id": "...",
  "after_version_id": "...",
  "related_drift_event_id": "...",                 // if trigger = drift_event

  "evaluator_score": 0.21,
  "score_decomposition": {                         // per §Evaluator outputs
    "l3_error": 0.45,
    "l3_schema_mismatch": 0.12,
    "user_feedback": 0.00,
    "evaluator_confidence": 0.08
  },

  "upstream_artifact_ids": ["ipat_abc", "pshim_xyz"],  // flagged artifacts, if cascade

  "rationale": "...",                              // REQUIRED human-readable explanation of WHY this action was taken.
                                                   //   LLM-synthesized for LLM-driven actions (synthesis proposals, drift flags).
                                                   //   Templated for deterministic actions (Evaluator-driven demotions —
                                                   //   composed from score_decomposition and trigger in prose form).
                                                   //   Always present on every audit record. Distinct from admin_note below.
  "evidence_ref": {                                // pointers to the underlying data the action drew on
    "observations": ["obs_...", "obs_..."],
    "graph_snapshot_id": "gs_...",
    "related_audit_ids": ["audit_..."]
  },

  "indefinite": false,                             // set true for critical admin actions
  "admin_note": "..."                              // OPTIONAL admin-supplied justification; separate from rationale
}

Rationale vs. admin_note. rationale is Apollo's own account of why it acted — always present, always auto-generated. admin_note is the admin's own commentary when they take an action — optional, human-supplied. Both are preserved and queryable.

Retention. Default 90 days (configurable via APOLLO_AUDIT_RETENTION_DAYS), enforced by the maintenance task's delete_by_query on expires_ts. Records marked indefinite: true have a null expires_ts and are never deleted — used for critical admin actions (forget of an artifact, pause/resume of Curator, rollback of a versioned artifact). The admin API allows setting indefinite when taking such actions.

Queryable. GET /api/v1/apollo/audit supports filters on time range, action, actor, artifact id, artifact type, trigger, and score-decomposition terms (e.g., "all demotions triggered primarily by L3 errors last 7 days"). Score decompositions let admins see why a score moved without re-deriving it from observations.

Evaluator

The Evaluator scores artifacts based on outcome correlation: after an artifact is published to guidance, do subsequent traces that used it produce better outcomes than traces that did not?

Inputs

  • Raw observations (trace outcomes)
  • Artifact usage records (which artifacts were returned in guidance, which were incorporated)
  • Explicit feedback signals

Failure signals (feeds the evaluator)

An event is considered a failure (negative signal for any artifact associated with its trace) if any of the following:

  1. Layer 3 returned an error (HTTP 5xx or tool exception) — Layer 3 performance signal. Applies to both agent and library observations. Under oracle-sole-observer (§Invariants 14), the observation is emitted by oracle; the signal keys on the envelope's service field (the observed L3 target), not on who performed the HTTP POST.
  2. Output schema mismatched the Layer 1 intent schemaLayer 3 performance signal. Applies only when the observed L3 service is an agent (component_kind == "agent" on its ServiceRegistry record). Libraries have no agent-level intent contract; their outputs are raw CRUD/compute results and schema mismatch is not evaluated for them. The Evaluator looks up component_kind by the envelope's service field at signal-application time — oracle is always the actual emitter, but the service it observed is what the contract keys on.
  3. User feedback was negative (thumbs-down, correction, abandoned conversation)
  4. Self-assessed evaluator confidence was below threshold

All four feed the Evaluator; signal 2 is gated on component_kind per the above. Weights are configurable via APOLLO_EVALUATOR_WEIGHTS.

Layer 3 performance carries amplified penalty

Signals 1 and 2 both reflect Layer 3 performance — what the backend services actually produced when acting on Apollo's guidance. If Layer 3 components are not performing well, that is a strong indication that the workflow generation (Layer 1 prompts) and the artifacts driving that generation need to be updated.

Accordingly, the Evaluator applies an amplified penalty to Layer 3 performance failures:

  • Default weight tiers: L3_performance: 3.0, user_feedback: 1.5, evaluator_confidence: 0.5.
  • Sustained L3 underperformance against a given artifact accelerates the Curator lifecycle:
  • Normal demotion cycle requires N=5 below-threshold evaluation cycles before forget.
  • L3-driven demotion triggers after N=2 cycles when signals 1 or 2 dominate the score. Rationale: if services are reliably failing on an artifact's guidance, waiting out a long demotion window lets bad guidance keep shaping traffic.
  • When an artifact's score degradation is attributable primarily to Layer 3 signals, the Evaluator additionally flags the upstream artifacts — the IntentPattern, PromptShim, or SpecFragment that shaped the Layer 1 prompt which in turn produced the Layer 3 call — for LLM review on the next synthesis trigger. The synthesis LLM may propose edits to those upstream artifacts, creating a cross-layer correction cycle.
  • Repeated L3 failures on the same artifact within a short window escalate to a DriftEvent (not just a score drop), forcing admin review rather than silent demotion.

Weights and thresholds are tunable via env vars (APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N).

Outputs

Per-artifact rolling score (exponential moving average). Scores feed the Curator's demote/forget policies. Scores are visible in admin stats and in the audit log when they trigger actions. Score decompositions (per-signal contributions) are retained so admins can see why a score moved — Layer 3 errors vs. user feedback vs. schema mismatch are distinguishable in the audit trail.

Admin Chat

A conversational interface to Apollo, gated by role admin via oracle's existing guardrails (component.oracle.gateway §Guardrails).

POST /api/v1/apollo/chat

Request body mirrors oracle's /chat:

{
  "message": "Forget everything Apollo learned about cohort X last week",
  "conversation_id": "apollo_admin_sess_...",
  "model": "default"
}

The admin chat uses Apollo's own LLM (separate from oracle's primary LLM) with a set of memory-management tools:

  • list_memories(filter)
  • get_memory(id)
  • forget_memory(id)
  • promote_artifact(id) / demote_artifact(id)
  • rollback_artifact(id, to_version)
  • rollback_graph(graph_id, to_snapshot)
  • trigger_synthesis(trace_id?)
  • explain_decision(trace_id | artifact_id | audit_id) — returns the rationale + evidence_ref for a Curator action
  • list_decisions(artifact_id?, since?, trigger?) — audit-filtered view of recent Curator actions
  • discuss_decision(artifact_id | audit_id) — opens a focused conversation thread: Apollo's LLM replies with the stored rationale, walks through the evidence (graph snapshot, score decomposition, upstream artifacts), and answers admin follow-ups. The admin can invoke promote/demote/rollback/forget tools inline in the same thread to act on the finding.
  • pause_curator() / resume_curator()

Admin ↔ Apollo conversation

Every Curator action carries a rationale written by Apollo at commit time and persisted in apollo_audit. Admin chat is the surface where those rationales become conversational: an admin asks "why did you just demote pshim_xyz?", Apollo's LLM retrieves the relevant audit record, reads out the rationale and evidence, and answers follow-up questions by re-reading the underlying observations and graph state.

This means admin chat is not just a command console — it is the review surface for Apollo's own findings. Admins can probe rationales, challenge them, and issue corrections (rollback, forget, edit, pause) without leaving the conversation. Every follow-up action is itself audited with actor: "admin:<username>" and a fresh rationale — so the admin's chain of reasoning is preserved in the audit log alongside Apollo's.

All admin-chat actions are logged to the audit index with actor: "admin:<username>".

Non-admin users cannot reach /chat. Their interaction with Apollo is purely transitive, through oracle.

Endpoints

REST (mounted under oracle's /api/v1/apollo/)

Method Path Who Purpose
POST /api/v1/apollo/observations Admin + out-of-process services Admin replay/seed, plus the fallback ingest path for services outside oracle's MCP dispatch reach. Phase-1 emitters (oracle + cortex) do not use this endpoint — oracle emits on their behalf in-process (§Ingest Semantics).
GET /api/v1/apollo/guidance?scope=l1 Admin Preview current L1-scoped artifact set
GET /api/v1/apollo/guidance?scope=l3:<service> Admin Preview current L3-scoped artifact set for an agent
GET /api/v1/apollo/guidance/schemas Admin Inspect learned intent schemas
GET /api/v1/apollo/guidance/tools Admin Inspect tool descriptions / routing hints
GET /api/v1/apollo/guidance/specs Admin Inspect spec fragments
GET /api/v1/apollo/guidance/connections Admin Inspect service-connection hints
GET /api/v1/apollo/guidance/stream?scope=<scope> Admin Real-time SSE feed of Curator commits (debugging only)
POST /api/v1/apollo/chat Admin Conversational admin interface
GET /api/v1/apollo/memories Admin List observations with filters
GET /api/v1/apollo/memories/{id} Admin Inspect one observation
POST /api/v1/apollo/memories Admin Seed an observation manually
PATCH /api/v1/apollo/memories/{id} Admin Edit metadata (tags, notes)
DELETE /api/v1/apollo/memories/{id} Admin Forget
GET /api/v1/apollo/artifacts Admin List learned artifacts
GET /api/v1/apollo/artifacts/{id} Admin Inspect one artifact + version history
PATCH /api/v1/apollo/artifacts/{id} Admin Edit
POST /api/v1/apollo/artifacts/{id}/promote Admin Promote
POST /api/v1/apollo/artifacts/{id}/demote Admin Demote
POST /api/v1/apollo/artifacts/{id}/rollback Admin Revert to a prior version
DELETE /api/v1/apollo/artifacts/{id} Admin Forget
GET /api/v1/apollo/audit Admin Query audit log
POST /api/v1/apollo/learn Admin Manually trigger an Apollo synthesis pass
GET /api/v1/apollo/stats Admin Apollo's own observability (counts, timings, scores)

MCP (admin chat tools)

Apollo's MCP tools mirror the admin CRUD surface, exposed only to Apollo's own admin chat LLM (§Admin Chat) — not aggregated into oracle's user-facing /agentspace MCP catalog. The tools are served from a private MCP endpoint mounted by oracle.apollo.chat.server and reachable only through the admin-chat conversation; they are never visible to L1 or L3 LLMs.

  • apollo_list_memories, apollo_get_memory, apollo_forget_memory
  • apollo_list_artifacts, apollo_get_artifact, apollo_promote_artifact, apollo_demote_artifact, apollo_rollback_artifact, apollo_forget_artifact
  • apollo_list_graphs, apollo_get_graph_snapshot, apollo_rollback_graph
  • apollo_query_audit
  • apollo_trigger_synthesis
  • apollo_list_decisions, apollo_explain_decision, apollo_discuss_decision
  • apollo_pause_curator, apollo_resume_curator
  • apollo_stats

Authentication & Authorization

  • Admin endpoints require admin role via oracle's OAuth middleware + guardrails (component.oracle.gateway).
  • Guidance GET endpoints are admin-only. L1 and L3 never call them. They exist for admin inspection of what Apollo would currently inject.
  • Secondary ingest path (POST /api/v1/apollo/observations) accepts either the admin's Bearer token (for replay/seed) or, for any out-of-process emitter, the user's forwarded Bearer token — the same token oracle forwards downstream in its existing cross-service calls. There is no service-token infrastructure in axonis-core today; every cross-service call in the stack forwards the user's Keycloak-issued token (verified end-to-end against JWKS). Admin replay/seed additionally requires the admin role. Phase-1 emitters do not exercise this path.
  • Oracle's primary in-process path (all L1-relayed events + oracle's own llm_turn + oracle-observed L3 tool_output / tool_error + final_response) bypasses network auth — it is a direct function call within the same process, already authenticated at the ingress by OAuthMiddleware.
  • Neither L1 nor L3 authenticates to Apollo — neither layer addresses Apollo on any path (ingest or guidance). Both talk to oracle; oracle handles Apollo (§Invariants 14).
  • Injection channel (response-attached) rides the ambient auth of the envelope it is embedded in. The /chat response is already authenticated per the inbound /chat request; the outbound MCP dispatch is already authenticated per oracle's forwarded token. No additional auth layer is introduced for attached guidance.
  • Admin SSE debug feed uses the same OAuthMiddleware on connection handshake and is gated to the admin role.
  • Apollo honors all oracle guardrails. Curator cannot widen a caller's tool access. Attached guidance that references tools a subscriber cannot use is filtered out before the envelope is serialized.
  • Deferred: once a Keycloak client-credentials grant is introduced for service-to-service auth (noted in component.oracle.gateway as pending), APOLLO_SERVICE_TOKEN-authenticated ingest from background/batch workers becomes possible. Until then, ingest without a user token context is not supported.

Ingest Semantics

Observation ingest has two paths. The primary path, used by every Phase-1 emitter (oracle + cortex), is in-process only — oracle observes the envelopes flowing across its own boundaries and calls oracle.apollo.observer.ingest directly. The secondary path is the HTTP POST endpoint, mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.

Primary path: in-process emission by oracle

Per §Invariants 14, neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter in production. On every inbound /chat request, oracle extracts L1 signals from the request body and calls the observer in-process. On every outbound MCP dispatch, oracle observes the round-trip and emits in-process on the L3 service's behalf:

Event(s) Emitted when Emitter call site
intent_schema, user_prompt, user_feedback /chat request arrives or a feedback submission is posted oracle/server/api/routes.py
llm_turn oracle's own LLM request/response cycle completes oracle/server/llm/tool_executor.py
tool_output, tool_error an outbound MCP dispatch to an L3 service returns oracle/server/llm/tool_executor.py + oracle/server/mcp/server.py (proxy path)
final_response oracle is about to return the /chat response body oracle/server/api/routes.py

All emissions flow through helpers in oracle/apollo/hooks/chat.py which enqueue the envelope on the in-process async queue via oracle.apollo.observer.ingest.ingest(...). No network call. No authentication layer (the helpers live inside oracle's process, authenticated at the ingress by OAuthMiddleware). Failure modes are purely local: a full queue increments apollo_ingest_queue_dropped_total; an observer exception is caught and logged so the user request is unaffected.

Secondary path: HTTP POST (admin replay + out-of-process services)

The POST /api/v1/apollo/observations endpoint remains mounted on oracle's Starlette app for two use cases:

  1. Admin replay/seed — an admin manually re-ingests observations (e.g., to backfill after an outage or to seed synthetic test data). Requires the admin role.
  2. Services outside oracle's MCP dispatch reach — any future service whose outputs are not observable through an oracle-mediated MCP round-trip can emit via ApolloClient. None of the Phase-1 emitters use this path.

Endpoint:

POST /api/v1/apollo/observations
Content-Type: application/json
Authorization: Bearer <user-token>         # admin token for replay, or the user's forwarded token for out-of-process services
traceparent: 00-<trace-id>-<parent-span-id>-<flags>

{ "observations": [<envelope>, ...] }

A single envelope is always valid; the array form enables batching on the client. Apollo responds 202 Accepted as soon as every envelope is placed on the in-process queue. Per-envelope validation happens inside the background worker and is logged (not bubbled to the caller) so a single bad envelope does not fail a batch.

The HTTP POST is a fire-and-accept call. Because Apollo's request handler does nothing but enqueue, the server-side operation is a local memory write — never a WAN hop inside the request. Client-side timeouts can therefore be generous (default 30 s) without risking silent drops from network jitter: the handler always responds in sub-millisecond time on a healthy Apollo.

Client-side helper: ApolloClient

ApolloClient in axonis-core is the HTTP client used by the secondary path. Phase-1 services (oracle + cortex) do not import it — oracle emits in-process and cortex emits nothing at all. ApolloClient is retained so admin tooling and any future out-of-process emitter can reach the endpoint without a bespoke HTTP client.

ApolloClient.emit(envelope) does a single httpx.AsyncClient.post with:

  • A generous request timeout (APOLLO_INGEST_POST_TIMEOUT_SEC, default 30).
  • Bounded retries with exponential backoff + jitter on transient failures (APOLLO_INGEST_RETRY_ATTEMPTS, default 2; APOLLO_INGEST_RETRY_BASE_MS, default 200; APOLLO_INGEST_RETRY_CAP_MS, default 2000). Transient = timeout, 5xx, 429, connection error. 4xx except 429 is not retried.
  • Client-side batching via a size-or-interval hybrid: APOLLO_INGEST_BATCH_SIZE (default 50) or APOLLO_INGEST_FLUSH_INTERVAL_MS (default 500), whichever first.
  • Lifecycle flush on process shutdown (signal handler + atexit) and on explicit ApolloClient.flush() calls.

ApolloClient is pure HTTP — the same shape as axonis-core's RestClient and MCPClient (axonis_core/gateway/). No new transport primitive is introduced.

Server side: in-process async queue

Apollo's ingest handler is thin:

async def ingest_handler(request):
    envelopes = parse_body(request)
    for env in envelopes:
        try:
            _INGEST_QUEUE.put_nowait(env)
            metrics.incr("apollo_ingest_accepted_total", service=env.service)
        except asyncio.QueueFull:
            metrics.incr("apollo_ingest_queue_dropped_total", service=env.service)
    return JSONResponse({"accepted": len(envelopes)}, status_code=202)

The queue is bounded by APOLLO_INGEST_QUEUE_MAXSIZE (default 10000). When the queue fills, put_nowait raises QueueFull and Apollo increments apollo_ingest_queue_dropped_total — the failure is never silent, visible on /stats under degraded_emitters.

A pool of background worker coroutines (APOLLO_INGEST_WORKER_CONCURRENCY, default 4) drains the queue. Each worker performs the full ingest: normalize → write to apollo_observations → update graphs → dispatch synthesis triggers per §Learner. Worker failures are logged and the envelope is reprocessed on a bounded retry budget (APOLLO_INGEST_WORKER_RETRY_ATTEMPTS, default 2) before being moved to a dead-letter log (APOLLO_INGEST_DEAD_LETTER_PATH, optional JSONL file; unset by default).

Failure visibility

No silent failure modes exist on the ingest paths — primary (oracle in-process) and secondary (HTTP POST). Every failure kind is counted. The {service} label is the envelope's service field — the observed L3 target for Phase-1 emissions (oracle is the actual emitter but per-service visibility is what operators need).

Metric Meaning
apollo_ingest_accepted_total{service} Envelopes successfully enqueued (both paths)
apollo_ingest_queue_dropped_total{service} Envelopes rejected because the queue was full (both paths)
apollo_ingest_post_failure_total{service, kind} Secondary-path POST failures after retries exhausted (timeout / 5xx / etc.). Never fires for Phase-1 emitters (they go in-process).
apollo_ingest_worker_failure_total{service} Background-worker failures after retries (moved to dead-letter) — applies to both paths
apollo_ingest_queue_depth Current depth of the in-process queue
apollo_ingest_last_ingest_ts{service} Timestamp of last successful enqueue per service — covers both oracle's in-process call and secondary-path POSTs
apollo_ingest_last_drain_ts{service} Timestamp of last successful worker drain per service

Services whose last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300) for a service that should be active, or whose queue_depth exceeds APOLLO_INGEST_DEPTH_WARN (default 5000), are surfaced on /stats under degraded_emitters. For Phase-1 services, "degraded" means oracle stopped observing them (e.g., oracle hasn't dispatched an MCP call to cortex in five minutes) — not that a POST failed.

Dedup on at-least-once delivery

Client retries can produce duplicate envelopes. Apollo's observer deduplicates on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC (default 300) before writing to Elastic.

Config knobs (all prefixed APOLLO_)

Env var Default Purpose
APOLLO_INGEST_BATCH_SIZE 50 Max envelopes per POST body
APOLLO_INGEST_FLUSH_INTERVAL_MS 500 Max time an envelope waits in client buffer before flushing
APOLLO_INGEST_POST_TIMEOUT_SEC 30 Per-POST HTTP timeout — generous, since the server handler is in-memory only
APOLLO_INGEST_RETRY_ATTEMPTS 2 Bounded client retries on transient failure
APOLLO_INGEST_RETRY_BASE_MS 200 Base delay for exponential backoff
APOLLO_INGEST_RETRY_CAP_MS 2000 Max delay between retries
APOLLO_INGEST_QUEUE_MAXSIZE 10000 Server-side in-process queue capacity
APOLLO_INGEST_WORKER_CONCURRENCY 4 Number of background worker coroutines draining the queue
APOLLO_INGEST_WORKER_RETRY_ATTEMPTS 2 Bounded worker retries before dead-letter
APOLLO_INGEST_DEAD_LETTER_PATH unset Optional JSONL path for envelopes moved to dead-letter after worker retries exhausted
APOLLO_INGEST_STALE_WARN_SEC 300 Seconds without a successful POST before an expected-active service is flagged
APOLLO_INGEST_DEPTH_WARN 5000 Queue-depth threshold for surfacing Apollo itself as degraded on /stats
APOLLO_INGEST_DEDUPE_WINDOW_SEC 300 Window for (trace_id, event_type, timestamp, service) dedupe on at-least-once delivery

Layer 1 Intent Schema Obligation

Layer 1 is expected but not required to emit an intent_schema observation with each request. The obligation is best-effort throughout Phase 1 and Phase 2, with a configurable path to required once Layer 1's schema contracts stabilize.

Best-effort mode (default)

  • Layer 1 SHOULD include an intent_schema block in every /chat request body it sends to oracle. Oracle extracts the block and emits the intent_schema observation to Apollo in-process (§Invariants 14 — L1 never addresses Apollo). A request without the block is still served; oracle simply emits no intent_schema observation for that trace.
  • If a schema is present on a trace, graph nodes are typed explicitly and the schema_mismatch failure signal (§Evaluator signal 2) is active for that trace.
  • If a schema is absent, Apollo's extractors fall back to prompt-inference and mark the resulting nodes inferred=true. Drift detection and evaluator confidence weight inferred nodes lower. The schema_mismatch signal is not evaluated for that trace; the L3-performance penalty (§Evaluator) still fires on signal 1 (hard errors), but signal 2 is dark.
  • GET /api/v1/apollo/stats reports intent_schema_coverage — percentage of traces with a Layer 1 schema in the last rolling window — so admins can see when Layer 1 coverage is high enough to flip to required.

Required mode

  • APOLLO_REQUIRE_INTENT_SCHEMA=true flips behavior: oracle rejects inbound /chat requests whose body lacks an intent_schema block with a 400 at the ingress — L1 is the direct caller and sees the rejection. Traces without a schema are never created; nothing to drop at the Observer layer.
  • The flip is a config change, not a code change. No Apollo, oracle, or L1 redeploy is needed — but Layer 1's /chat emission behavior must already include the schema or the flip will start rejecting real traffic.
  • Phase 3 is the expected time to flip, once Curator empowerment demands the cleaner signal. Admin can flip earlier if stats show high coverage.

Logging

Every Apollo module and every service participating in Apollo's observation / injection loop uses the axonis-core logger rather than a module-local logging.getLogger() call. The logger module is axonis.logger, which implements the three-logger convention (log, error, audit) with consistent handler shapes so logs from any component read coherently when aggregated.

Three loggers, three audiences

Logger When to use Destination
log Normal operational telemetry — info, warning, debug. Console + axonis.log
error Exceptions, permanent failures, data-loss events, misconfiguration. Console + error.log
audit Important transactions that must be traceable independently of volume. audit.log (file only)

Import pattern:

from axonis.logger import log, error, audit

What counts as audit-worthy

Apollo MUST route the following transactions through the audit logger so they leave a trail in audit.log separate from regular operational noise:

  • Worker pool start / shutdown / cancellation (§Ingest Semantics).
  • Graph snapshot completion (§Snapshots and trajectory) — per hour.
  • Every Curator action — promote, demote, forget, edit, rollback, compact, drift-hold, upstream-flag, pause_curator, resume_curator. (Complementary to the apollo_audit Elastic index: the audit log captures the event as structured text alongside other platform audit events; the Elastic index is the queryable, structured source of truth.)
  • LLM synthesis proposals that result in a Curator commit (the proposal → drift-check → commit boundary).
  • DriftEvent creation (§Graph-anchor drift check).
  • Admin chat actions that mutate state, logged with actor: "admin:<username>".
  • Guidance injection commits — every push of apollo_guidance onto an outbound envelope is audit-worthy at the commit level, though the per-turn attachment is a delivery detail (not audited).
  • Subscriber connection / disconnection events on the admin SSE debug feed.

What stays in log / error

  • Per-observation ingest (log.info / log.debug) — too high-volume for audit.
  • Per-attach-turn emissions — ditto.
  • Retry attempts, transient failures — log.warning.
  • Timeouts on the attach path (graceful degradation) — log.warning.
  • Queue overflow, exhausted retries, worker failures — error.
  • Exceptions swallowed by the hot path — error so they still land in error.log without propagating into the request path.

Rationale

Splitting the three channels keeps audit.log the single place an operator or admin-chat tool can scan when investigating a system-level state change without being drowned in routine telemetry. Separating error.log keeps every permanent-failure signal (data loss, persistent outage, contract violation) in one place regardless of which module it came from.

Failure Posture

  • Apollo slow on attach: the in-process guidance fetch is bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms). On overshoot, oracle serializes the /chat response or MCP dispatch without apollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. User sees no failure; metric apollo_guidance_attach_timeout_total surfaces the event.
  • Apollo unreachable as a process: since Apollo is a package inside oracle, "Apollo unreachable" means oracle is itself broken, which is a larger incident. If the Apollo module fails to import or initialize at startup, oracle continues serving /chat and tool dispatches without apollo_guidance attached. Ingest endpoint returns 503.
  • Ingest queue full: POST /api/v1/apollo/observations responds 202 but increments apollo_ingest_queue_dropped_total{service}. Never silent — visible on /stats under degraded_emitters.
  • Ingest client POST fails: client retries within budget, then drops the batch and increments apollo_ingest_post_failure_total{service, kind}. Visible on /stats. Emitter's task continues unaffected (observations are telemetry, not transactional).
  • Apollo worker crashes mid-ingest: at-least-once redelivery from the asyncio queue; observer dedupes on (trace_id, event_type, timestamp, service).
  • Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation on subsequent observations and demotes; the next attached payload reflects the demotion. Admin can force-rollback via audit log at any time.
  • Curator goes rogue: every action (mutation + commit) is in the audit log; admin can pause_curator() immediately via chat or CLI. Paused Curator → artifact set stops changing → attached payloads continue to reflect the as-of-pause state until resume.

Apollo's LLM

Apollo runs its own LLM, separate from oracle's user-facing LLM routing. Apollo's LLM is the primary driver of synthesis, invoked per event (see §Learner → LLM synthesis).

Model: pluggable by design

The model is selected by configuration and must remain swappable without code changes. Apollo's LLM provider layer normalizes across providers so that a newer, stronger model can replace the current one as the state of the art advances.

Current default: MiniMax M2.7.

It is the best-available fit at the time of this spec given its context window, cost profile, and availability — but the spec is deliberately agnostic. Apollo must not encode MiniMax-specific assumptions in prompt shapes, input formats, or response parsers. The provider layer handles any per-model translation.

Operators can swap the model by changing env vars only:

APOLLO_LLM_PROVIDER=minimax            # current default; swap with any provider registered in the router
APOLLO_LLM_MODEL=m2.7                  # current default; replace with a newer model when available
APOLLO_LLM_API_KEY=...
APOLLO_LLM_BASE_URL=...                # for self-hosted or proxied inference
APOLLO_SYNTHESIS_MAX_CONCURRENT=4      # cap on concurrent synthesis calls (event bursts from L3)
APOLLO_GUIDANCE_TIMEOUT_MS=50          # timeout for admin GET /guidance* inspection calls

The LLM router (oracle/apollo/llm.py) must support MiniMax as a first-class provider alongside anthropic / openai / groq / trinity / ollama in oracle's existing router. New providers register through the same interface — adding a model is an additive router change, never a change to Apollo's business logic.

Local MiniMax via HuggingFace (native, pre-trained, on-disk)

Apollo's LLM layer reserves a provider slot for a locally-stored, HuggingFace-pulled MiniMax model — a complement to the default OpenAI-compatible endpoint path. This path is intended for deployments where one or more of the following holds:

  • The cluster is air-gapped and cannot reach MiniMax's hosted endpoint.
  • Operators prefer running inference on their own GPU inventory for latency, cost, or data-governance reasons.
  • A fine-tuned MiniMax variant the operator owns needs to be loaded instead of the stock checkpoint.

Provider selector. Set APOLLO_LLM_PROVIDER=minimax-local (see §Environment Configuration). The openai provider continues to be the default for hosted deployments; nothing about the hosted path changes.

Canonical HuggingFace load signature. The provider MUST honor the model card's canonical call shape — the same two lines the MiniMax team publishes on the model page:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)

trust_remote_code=True is required because MiniMax ships its own tokenizer and modeling code alongside the weights. A future operator who swaps to a fine-tune with custom modeling code needs the flag too.

On-disk location (HuggingFace convention). HuggingFace caches pulled models under:

${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/
    blobs/               # content-addressed weight shards
    snapshots/<sha>/     # symlinks to blobs for the resolved revision
    refs/                # branch/tag pointers

The exact layout is HuggingFace's; Apollo does not parse or override it. Operators pre-pull the model with either of:

huggingface-cli download MiniMaxAI/MiniMax-M2.7
# or, equivalently, any Python that calls from_pretrained() once to warm the cache.

Pre-pulling is the recommended pattern for production: the first from_pretrained call in a cold container downloads tens of gigabytes of weights, which is not acceptable on the request path. Pre-pull during image build or via an init container.

Operator-controlled path (reserved knob, not yet implemented). A future enhancement will add APOLLO_LLM_LOCAL_MODEL_PATH for operators who keep weights outside the HF cache (e.g., a mounted shared filesystem with a custom fine-tune). When set, the provider passes the path verbatim as the first positional argument to from_pretrained instead of the MiniMaxAI/MiniMax-M2.7 model id. Until that knob is wired, the provider loads only the stock MiniMax checkpoint from the HF cache.

Disk + GPU requirements. MiniMax-M2.7 is a large model: expect the checkpoint to land in the tens of gigabytes on disk, and plan for a GPU (or multi-GPU node) with enough VRAM for the resolved context window. Deployments that cannot meet those budgets should stay on the hosted endpoint path.

What ships today vs. what is deferred. The minimax-local provider is scaffolded in oracle/apollo/llm.py at Milestone 8 — it imports transformers lazily, honors the canonical load signature above, and can complete a prompt on a machine that has the weights and deps in place. The following production-grade enhancements are intentionally deferred to a later milestone and tracked under §Deferred below:

  • Thread-pool / process-pool offload of the synchronous HF forward pass (today the call runs inline on the event loop).
  • Explicit device mapping (device_map="auto", torch_dtype, bitsandbytes / 4-bit / 8-bit quantization knobs).
  • APOLLO_LLM_LOCAL_MODEL_PATH operator override for non-HF-cache paths.
  • Pre-pull orchestration + readiness gate (block APOLLO_LLM_PROVIDER=minimax-local deployments from serving until the checkpoint is resident and the forward pass warms successfully).
  • Streaming tokens through the provider abstraction (admin chat UX).

Until those land, minimax-local is a dev-time and air-gapped-lab-time fallback; the default production pattern remains APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at an OpenAI-compatible MiniMax endpoint.

Separation from oracle's user-facing LLM

Apollo's LLM configuration is independent of oracle's user-facing LLM routing. The two can use the same provider or different providers; the same model or different models. Apollo's usage is tracked separately via the Meter (component.oracle.gateway §Metering) under client id apollo. User-facing chat metrics and Apollo metrics are separate in dashboards.

Apollo owns its LLM client — not axonis-core's Client

axonis-core provides the platform's shared LLM client (axonis.llm.Client; platform.axonis-core#llm-pattern) for user-facing chat (L1/L2) and simple backend services. Apollo does not consume it. It keeps a purpose-built client at oracle/apollo/llm.py (LLMClient), because the Curator and admin-chat have requirements the shared, deliberately-lightweight core client does not — and should not — carry. Each difference is load-bearing, not incidental duplication:

Apollo capability Why Apollo needs it Why it stays out of core Client
response_format="json" + LLMResponse.as_json() / parsed The Curator/synthesis path demands strict-JSON proposals against a documented schema; as_json() parses defensively and returns None on a malformed body so the dispatcher routes to drift review rather than dropping a bad proposal (apollo/learner/prompts.py, synthesis.py). Core Client is a generic completion surface — no JSON-format biasing, no parse / None-on-malformed contract. Adding it would couple core to Apollo's synthesis semantics.
tool_choice control (auto / none / required) + response_format="text" Apollo's admin chat (apollo/chat/server.py) streams prose and offers inspection tools, steering / forcing / suppressing tool use per turn. Core's OpenAI-compatible path is raw httpx and hardcodes tool_choice="auto" with no response_format — by design, to stay minimal.
minimax-local provider (in-process MiniMax-M2.7 via HuggingFace transformers) Dev / air-gapped deployments with no network path to a hosted MiniMax endpoint. platform.axonis-core mandates axonis-core stay lightweight with no ML dependencies; torch / transformers must not enter core.
openai SDK transport (not raw httpx) First-class response_format + tool_choice against MiniMax's OpenAI-compatible endpoint. Core uses raw httpx for its OpenAI-compatible providers and does not depend on the openai SDK there.
Stub + singleton harness (install_stub_response / install_stub_stream, reset_singleton, get()) Synthesis / curator / admin-chat tests inject canned responses and stream programs to exercise dispatch and drift-routing deterministically, with no network or GPU. Core Client is stateless and harness-free; its consumers mock at the call boundary instead.

Consequently Apollo's LLMResponse (content, parsed, tokens_in, tokens_out, tool_calls: list[ToolCall(id, name, arguments)]) and core's Response (text, tool_calls: list[ToolCall(id, name, input)], stop_reason, input_tokens, output_tokens) differ deliberately — two value objects for two jobs, consumed by disjoint call sites (Apollo's synthesis / admin-chat code + tests vs. oracle's chat loop).

Decision (2026-06-04): Apollo's LLM client is NOT converged onto axonis-core's Client. The two share design DNA (provider-agnostic completion + streaming with a typed terminal response) but serve different layers. Converging would either strip the Curator's JSON-synthesis robustness and the admin-chat controls, or push Apollo-specific features and heavy ML dependencies into the lightweight core. The only genuine overlap — raw provider calling — cannot be cleanly shared anyway, because Apollo requires response_format / tool_choice that core's httpx path omits. Revisit only if Apollo's needs converge with the user-facing client (e.g. it drops the local-model and strict-JSON-synthesis requirements).

Budget isolation

A burst of synthesis calls triggered by a long fusion run must not starve user-facing chat. Apollo's LLM has its own rate limit, its own quota, and its own metering client id. When Apollo's quota is exceeded it defers synthesis (the event queue holds triggers up to a cap); user-facing oracle chat is unaffected.

Drift Prevention

Apollo influences a large fraction of the system. Drift in its artifacts cascades into the prompts of layer 1 and the outputs of layer 3. The spec encodes several anti-drift guarantees:

  1. Observation cadence is fixed and coarse. No per-token events. High-signal-to-noise ratio in the raw data.
  2. Graphs are the deterministic anchor. Decision graphs update on every observation via rule-based extractors — they never create free-form artifacts and cannot drift on their own. They record what actually happened, not what the LLM thinks happened.
  3. Every LLM output is checked against the graphs. The LLM is the primary driver of synthesis, but every artifact, promotion, and pattern it proposes is validated against current graph state and trajectory before the Curator commits it. Proposals that diverge from graph-recorded reality are flagged as DriftEvent and held for admin review.
  4. Drift is detected structurally, not rate-limited. Short-window vs. long-window edge-weight divergence, rate-of-new-nodes caps, LLM-output-vs-graph divergence, and trajectory breaks distinguish smooth evolution (allowed) from sudden shift (flagged). Apollo can learn continuously because the graphs provide a rigid referent.
  5. Curator is bounded. Cannot touch auth, guardrails, or user data — only Apollo's own artifacts.
  6. Every Curator action is auditable. Admin can see what changed, when, why, and by whom.
  7. Every artifact is versioned. Rollback is always possible.
  8. Evaluator closes the loop. Artifacts that stop correlating with good outcomes decay automatically.
  9. Admin can pause the Curator. An emergency off-switch prevents runaway mutation.
  10. Guidance degrades gracefully. If Apollo is slow or wrong, oracle falls through without injection — the base system still functions.

Phased Rollout

Apollo ships in three phases to manage scope and risk.

Phase 1 — Observe and ground

  • Observer with oracle as the sole Phase-1 emitter — oracle calls oracle.apollo.observer.ingest in-process for every L1-, L2-, and L3-origin event. POST /api/v1/apollo/observations is mounted for admin replay/seed and future out-of-process emitters (not used by oracle or cortex).
  • Memory indices live: apollo_observations, apollo_graph_nodes, apollo_graph_edges, apollo_graph_snapshots
  • Deterministic graph updates on every ingested observation (extractors, node/edge upserts, EWMA weights)
  • Hourly graph snapshots + maintenance task (delete_by_query on expires_ts, snapshot coarsening)
  • Guidance API serves graph-derived context only (no artifacts yet; artifacts index is empty)
  • Admin memory and graph CRUD endpoints
  • Admin chat (read-only — can inspect graphs, observations, lineage; cannot yet promote/demote)
  • Backend emitter integration: oracle itself + cortex (parallax deferred to a follow-on phase per Q7); cortex is the Phase 1 L3 subscriber per Q20. Beacon (L1) is deferred until a beacon↔oracle connection is designed.
  • Apollo is additive: Apollo lives entirely under oracle/apollo/ with its own memory, indices, and stores. Oracle's existing memory modules (oracle/server/memory/conversation.py, oracle/server/memory/cross_service.py, oracle/server/models/memory.py) remain untouched and continue to serve their current callers.

Phase 2 — Synthesize and advise

  • Event-driven LLM synthesis loop live (triggers: L1 request ingested, L3 output ingested, admin chat, admin-initiated)
  • Artifact creation, editing, promotion, demotion (IntentPattern, FailurePattern, ToolPairingHint, etc.)
  • Graph-anchor drift check on every LLM synthesis output
  • DriftEvent artifacts produced on divergence; flagged proposals held for admin review
  • Evaluator scoring active (weighted failure signals feed rolling per-artifact scores)
  • Injection Channel live. Oracle attaches current applicable guidance to every /chat response body (L1) and to every outbound MCP dispatch bound for an agent-kind L3 service. Attach budget APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms) omits the field on overshoot without failing the request.
  • Admin inspection endpoints (GET /guidance?scope=..., admin-only SSE debug feed) live
  • Admin chat fully active (can trigger synthesis, promote/demote/rollback artifacts, pause Curator)
  • Remaining backend emitters onboard (see §Integration Backlog)

Phase 3 — Empower and maintain

  • Curator autonomous actions enabled (promote/demote/forget without admin approval, bounded by §Curator hard invariants)
  • Evaluator-driven demotion and forgetting cycles (score below threshold for N=5 cycles → forget, audited)
  • LLM-driven compaction of expiring observations into summary artifacts (event-driven, at admin-initiated synthesis)
  • Full audit and rollback surface live (apollo_audit index + apollo_artifact_history)
  • Oracle's existing memory modules (conversation.py, cross_service.py, models/memory.py) remain in place. Any consolidation is out of scope for the Apollo rollout — see §Deferred: Consolidation of Oracle Memory Modules.

Deferred: Consolidation of Oracle Memory Modules

Oracle today has three memory modules that predate Apollo:

  • oracle/server/memory/conversation.py — Redis-backed multi-turn conversation history (ConversationStore)
  • oracle/server/memory/cross_service.py — Redis KV cross-namespace fact store (CrossServiceMemory)
  • oracle/server/models/memory.py — a stub over axonis-core's Memory(UDS) primitive

Apollo provides overlapping capabilities: conversation lineage is reconstructible from user_prompt + final_response observations; fact storage is superseded by artifacts + graphs; the UDS memory class is already the canonical substrate under platform.axonis-core.

Recommendation (deferred, not in scope): once Apollo is proven in production and its graphs/artifacts demonstrably cover the use cases served by these modules, the three can be consolidated:

  • conversation.py → absorbed into oracle/apollo/memory/ (if still needed beyond observation reconstruction)
  • cross_service.py → deprecated; call sites migrate to Apollo guidance queries or direct Memory(UDS) reads
  • models/memory.py → deleted; any imports switch to axonis_core.userspace.intelligence.Memory

Status: not scheduled. Oracle's existing modules stay in place throughout all three Apollo phases. Consolidation requires a separate, explicitly-scoped effort once the user approves it — Apollo will not reach into or replace oracle's existing memory surface as part of this spec.

Integration Backlog

Phase 1 onboards oracle + cortex as observation emitters and cortex (L3) as the lone guidance subscriber. Oracle is the sole emitter — cortex carries no Apollo emission code; oracle observes each MCP round-trip to it and emits tool_output / tool_error in-process on its behalf (§Ingest Semantics → Primary path). Beacon (L1) is deferred from Phase 1 because beacon has no HTTP connection to oracle today — its MCP_SERVER_URL points at cortex direct, so attached apollo_guidance has no path into beacon's process. The beacon↔oracle connection is a separate spec decision tracked in §Integration Backlog. Parallax is deferred from Phase 1; its emitter (oracle-observed) and subscriber (ApolloGuidanceCache) wiring follow the same pattern as cortex when it onboards.

Additional services become visible to Apollo through one of two mechanisms, chosen per-service based on how oracle reaches them:

  • In-process relay (default). Any service oracle MCP-dispatches to — i.e., any service registered on ServiceRegistry and reachable through oracle's tool-use or MCP-proxy paths — is automatically observed by oracle with no code changes in that service. Onboarding is a single-line addition: setting component_kind on the service's ServiceRegistry record.
  • Direct POST via ApolloClient (fallback). Services whose outputs are not observable through an oracle-mediated MCP round-trip (batch jobs, out-of-process workers, federated emitters, etc.) POST envelopes to /api/v1/apollo/observations themselves. No Phase-1 service uses this path.

Until one of these two mechanisms is wired for a service, its outputs are invisible to Apollo.

component_kind classification

Every L3 service that registers with oracle's ServiceRegistry must declare a component_kind:

  • agent — has its own LLM, makes prompt-driven decisions, receives apollo_guidance attached to every MCP dispatch from oracle, and has an intent contract against which schema_mismatch (Evaluator signal 2) is evaluated.
  • library — no LLM, purely operational (CRUD, compute, I/O). Emits tool_output / tool_error observations but does not receive apollo_guidance in its MCP dispatches (oracle filters it out before serialization) and is not subject to the schema_mismatch signal. Evaluator signal 1 (hard error) still applies.

The classification is a field on the ServiceRegistry record (server/mcp/registry.py). It is set at registration time and may be changed by the owning team via re-registration. Apollo treats ServiceRegistry as the single source of truth — no separate Apollo registration is used (§Invariants item 16).

Status table

Classifications below are initial best-guesses; each service's owning team confirms on integration. Misclassification is low-risk: a wrongly-tagged library will simply see apollo_guidance attached to its dispatches and ignore it (no LLM to read it); a wrongly-tagged agent will miss guidance it could have used. Either is corrected by updating the ServiceRegistry record.

Service Kind Status Notes
oracle n/a (L2) Phase 1 In-process emission; no network call; not an L3 subscriber
cortex agent Phase 1 Query-adjacent reasoning; Phase 1 subscriber wiring lands in M14
parallax agent Deferred Same pattern as cortex when onboarded; fusion-run execution; LLM-driven workflow
fedai-rest library Pending Dataset CRUD; ops / libs host; emits on dataset read/write + op outcomes
testament TBD Pending Kind + emitter integration TBD by team
titan TBD Pending Kind + emitter integration TBD by team
rest / fedai-rest library Pending Federation REST layer; emits on federated request outcomes

Onboarding a pending service requires no Apollo code changes and — when the service is reachable from oracle's MCP dispatch path — no code changes in the pending service either. The only required artifact is the component_kind declaration on that service's ServiceRegistry record. Services on the fallback path additionally import ApolloClient and emit from their own process.

Environment Configuration

Apollo does not redefine any env var that already exists in the platform deployment layer. The canonical source for deployment-level configuration is developers-environment/conf/*.env — one file per target (development.axonis.ai.env, matrix.axonis.ai.env, edge.axonis.ai.env, vector.axonis.ai.env, etc.). Every target ships a consistent platform baseline; Apollo inherits it transitively through axonis-core, oracle, and its own storage/logger dependencies.

Inherited platform variables (not Apollo's to define)

Variable(s) Consumer Apollo's use
ELASTIC_HOST, ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_VERIFY, ELASTIC_TIMEOUT, TEMPLATES_DIR, ELASTIC_PKI_CA axonis.elastic.Elastic Storage for apollo_observations, apollo_artifacts, apollo_graph_*, apollo_audit. Every Memory(UDS) subclass in apollo/memory/store.py inherits this config.
REDIS_URL (oracle-style) or REDIS_HOST + REDIS_PORT + REDIS_PASSWORD + REDIS_DB + REDIS_TLS + REDIS_VERIFY (platform-standard) oracle/server/memory/*, axonis.redis.Redis Oracle's ConversationStore + CrossServiceMemory; unused directly by Apollo.
SSO_CLIENT_ID, SSO_CLIENT_SECRET, SSO_TOKEN_URL, SSO_WELLKNOWN, SSO_INTROSPECT_URL, SSO_VERIFY oracle's OAuthMiddleware (+ axonis.auth) Validates Bearer tokens on every request reaching /api/v1/apollo/*. No Apollo-specific auth config.
ATLAS_LOG_LEVEL, ATLAS_WORKSPACE, AXONIS_LOG_LEVEL, AXONIS_WORKSPACE axonis.logger (§Logging) Log level + log-file root for Apollo's three logger streams (log/error/audit). oracle/tests/conftest.py also respects ATLAS_WORKSPACE for test-session log placement.
FEDERATE_DOMAIN, FEDERATE_NAME, FEDERATE_UUID, FEDERATE_PARTY_*, FEDERATE_PROTOCOL_*, FEDERATE_WORK_MODE_* axonis.uds federation hooks Picked up automatically if/when Apollo artifacts start federating (post-Milestone 13). No Apollo-specific federation config.

Apollo-owned variables (all APOLLO_*)

Canonical location: developers-environment/conf/*.env — specifically the shared dev-env file (development.axonis.ai.env) plus any target-specific overrides (matrix.axonis.ai.env, vector.axonis.ai.env, edge.axonis.ai.env). Every APOLLO_* variable is declared there with a production-ready default. oracle/apollo/settings.py reads them via os.getenv(...) with fall-back defaults that match the env-file values, so if the shared env is unsourced the system still comes up sensibly — but the authoritative source is the deployment env file.

Why it lives in the shared env file rather than per-service: Apollo's observation path runs in oracle, but its configuration surface informs the contract every other service consumes (guidance attach budgets, trace-propagation expectations, retention windows). Keeping the defaults in the shared env file means oracle, parallax, cortex, and beacon all load the same baseline — an operator flipping APOLLO_CURATOR_AUTONOMOUS=true in the shared file affects the whole deployment consistently.

Every APOLLO_* variable Apollo's settings.py reads is mirrored in the env file. Grouped by subsystem:

  • LLM: APOLLO_LLM_PROVIDER, APOLLO_LLM_MODEL, APOLLO_LLM_API_KEY, APOLLO_LLM_BASE_URL (+ reserved APOLLO_LLM_LOCAL_MODEL_PATH, not yet implemented)
  • Synthesis: APOLLO_SYNTHESIS_MAX_CONCURRENT
  • Guidance attach: APOLLO_GUIDANCE_ATTACH_ENABLED, APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS, APOLLO_GUIDANCE_TIMEOUT_MS
  • Ingest client side: APOLLO_INGEST_BATCH_SIZE, APOLLO_INGEST_FLUSH_INTERVAL_MS, APOLLO_INGEST_POST_TIMEOUT_SEC, APOLLO_INGEST_RETRY_ATTEMPTS, APOLLO_INGEST_RETRY_BASE_MS, APOLLO_INGEST_RETRY_CAP_MS
  • Ingest server side: APOLLO_INGEST_QUEUE_MAXSIZE, APOLLO_INGEST_WORKER_CONCURRENCY, APOLLO_INGEST_WORKER_RETRY_ATTEMPTS, APOLLO_INGEST_DEAD_LETTER_PATH, APOLLO_INGEST_STALE_WARN_SEC, APOLLO_INGEST_DEPTH_WARN, APOLLO_INGEST_DEDUPE_WINDOW_SEC, APOLLO_EMITTER_ENABLED
  • Decision Graphs: APOLLO_GRAPH_SNAPSHOT_INTERVAL, APOLLO_GRAPH_EWMA_SHORT, APOLLO_GRAPH_EWMA_LONG, APOLLO_GRAPH_TRACE_STATE_TTL_SEC
  • Evaluator: APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N, APOLLO_EVALUATOR_NORMAL_DEMOTE_N
  • Curator: APOLLO_CURATOR_AUTONOMOUS, APOLLO_CURATOR_AUTO_INTERVAL_SEC, APOLLO_COMPACTION_AUTO
  • Audit: APOLLO_AUDIT_RETENTION_DAYS, APOLLO_INJECTION_AUDIT_HEARTBEAT_INTERVAL
  • Maintenance: APOLLO_MAINTENANCE_INTERVAL, APOLLO_OBSERVATION_RETENTION_DAYS
  • Trace propagation: APOLLO_TRACE_HEADER, APOLLO_REQUIRE_TRACEPARENT
  • Observation obligations: APOLLO_REQUIRE_INTENT_SCHEMA
  • Drift detection: APOLLO_DRIFT_Z_SCORE_THRESHOLD, APOLLO_DRIFT_NEW_NODES_PER_HOUR_CAP, APOLLO_DRIFT_TRAJECTORY_TOLERANCE
  • Integration (ApolloClient): APOLLO_BASE_URL

None of these duplicate a platform variable. When adding a new APOLLO_*, add it to both oracle/apollo/settings.py (with its default) and the shared env file (with the same default) in one commit.

Per-deployment overrides

Each *.env in developers-environment/conf/ targets a specific deployment (development, matrix, vector, edge, etc.). The shared development.axonis.ai.env holds the baseline; production targets override via their own file. Any Apollo variable that needs to differ per target lives in the target-specific env — never hardcoded into settings.py. Operators change behavior by editing the env file and reloading, not by shipping code.

Dependencies

[project]
dependencies = [
    # inherited from oracle (see component.oracle.gateway)
    "axonis-core",
    "fastapi>=0.110.0",
    "starlette>=0.36.0",
    "redis>=4.0.0",
    "anthropic>=0.40.0",
    "openai>=1.0.0",

    # apollo-specific
    "sentence-transformers>=3.0.0",   # embeddings
    "numpy>=1.24",                    # dense-vector math
]

LLM provider SDK. Apollo's current default LLM is MiniMax M2.7 (see §Apollo's LLM). MiniMax exposes an OpenAI-compatible API, so Apollo reaches it via the existing openai client with APOLLO_LLM_BASE_URL pointed at the MiniMax endpoint — no new SDK dependency is added. If a future model swap requires a non-OpenAI-compatible provider, an additive dependency joins oracle's existing provider set.

Apollo introduces no new top-level dependencies beyond libraries already declared in oracle's pyproject.toml; it activates existing dependencies (notably sentence-transformers and numpy) that oracle already includes.

Invariants

  1. Apollo does not execute workflows. It observes, learns, and advises. It never calls tools, never invokes backend services, never retries a failed request. Layer 1 drives iteration.
  2. Curator empowerment is bounded to Apollo's own artifacts. Curator cannot change auth, guardrails, token scopes, user conversations, or any non-Apollo state.
  3. Every autonomous action is auditable. No Curator mutation occurs without a record in apollo_audit.
  4. Apollo is internal. No Apollo endpoint is exposed outside the cluster except through oracle's existing external surface. Oracle remains the only externally exposed service (component.oracle.gateway invariant 1).
  5. Apollo failures do not break oracle. Guidance attachment has a hard timeout (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS); on overshoot or internal Apollo failure, oracle serializes the response / MCP dispatch without apollo_guidance. Ingest failures never block the emitter's task and are surfaced as metrics (apollo_ingest_post_failure_total, apollo_ingest_queue_dropped_total) on /stats — never silent.
  6. Apollo uses axonis-core's Memory UDS as its storage primitive. It does not re-implement or bypass the UDS pattern from platform.axonis-core.
  7. Admin chat is the only conversational surface. Role admin is required. Non-admin users interact with Apollo only transitively via oracle.
  8. Observation cadence is coarse by design. Token-level observations are prohibited. Turn-level, tool-level, error-level, and response-level only.
  9. Axonis-core remains ML-free. Any future ML dependencies (e.g., embedding generation) live in oracle/apollo/, not in axonis-core. platform.axonis-core invariant 1 is preserved.
  10. Artifacts are versioned; graphs are snapshotted. Every Curator mutation to an artifact creates a new version in apollo_artifact_history; graph-level rollback uses the hourly/daily/weekly snapshot tiers. Rollback is always possible.
  11. Oracle's existing memory modules are not modified. Apollo is additive and coexists with oracle/server/memory/* and oracle/server/models/memory.py throughout all three phases. Consolidation is deferred and out of scope (§Deferred: Consolidation of Oracle Memory Modules).
  12. Apollo's LLM is pluggable. No MiniMax-specific assumptions in prompts, input shapes, or response parsers. Model swap is a config change via APOLLO_LLM_PROVIDER / APOLLO_LLM_MODEL, never a code change.
  13. Layer 3 performance is the strongest failure signal. Evaluator weighting amplifies L3 errors and schema mismatches over softer signals, accelerates demotion on L3-dominant score drops, and cascades to flag upstream artifacts for synthesis review.
  14. Neither L1 nor L3 addresses Apollo directly. L1 talks to oracle; L3 talks to oracle (via MCP); oracle talks to Apollo. L1 and L3 hold no Apollo endpoint knowledge, no Apollo credentials, and make no Apollo calls on any in-production path. Oracle is Apollo's sole emitter for all Phase-1 events: L1-origin observations (intent_schema, user_prompt, user_feedback) are emitted by oracle in-process after oracle receives the corresponding signal from L1; L3-origin observations (tool_output, tool_error) are emitted by oracle in-process after the MCP round-trip to an L3 service returns. Guidance flows the same way in reverse: it reaches L1 attached to /chat responses, reaches oracle's own chat LLM in-process (no transport, since oracle hosts Apollo), and reaches L3 attached to outbound MCP tool dispatches. The POST /api/v1/apollo/observations endpoint exists as a secondary path for admin replay/seed and for future services running outside oracle's MCP dispatch reach; Phase-1 emitters do not use it. No long-lived connections, no service tokens, no push channel in production.
  15. Injection cannot execute code in any subscriber. Attached apollo_guidance (or in-process cache contents on the L2 path) carries artifact data only. Subscribers update a local cache and consult it on their next LLM turn. Apollo cannot force a subscriber to act, call a tool, or mutate any state beyond its own cache.
  16. No subscriber registry, no push channel. Apollo has no list of subscribers to push to. Guidance is delivered by oracle attaching the current applicable set to every response/dispatch leaving oracle (L1 attach, L3 attach) and consulted in-process by oracle's own chat LLM on the L2 path. L3 agent eligibility is still governed by component_kind on the ServiceRegistry record (libraries are filtered out before attachment); L1 eligibility is implicit (every /chat response carries L1 guidance); L2 consumption is implicit (oracle's tool-executor consults the local cache before every turn).
  17. Apollo is the cross-service knowledge transfer channel. MemoryService (axonis-core) is strictly per-service — every recall is scoped to the calling service's (user_id, service). A preference, fact, or instruction expressed to one service is never directly readable by another. When the same intent needs to shape behaviour across services (e.g. "user prefers concise responses" expressed to beacon should also bias oracle), Apollo's observation stream picks it up, synthesis distills it into an artifact (e.g. a PromptShim with applicability.service_name = null for cross-service scope), and the guidance attach channel delivers it to every applicable subscriber. Apollo never instantiates MemoryService for cross-service reads — its view is the observation stream, which inherently spans all services. This separation means cross-service knowledge transfer is always curated, audited, and reversible (demote / forget) rather than implicit through silent shared-index reads.

Test Expectations

  • Observer tests: each event type round-trips correctly through ingest; trace_id and parent_trace_id stitching works; cadence limits are enforced (no token-level events accepted); every Phase-1 event — L1-origin (intent_schema, user_prompt, user_feedback), oracle's own (llm_turn, final_response), and L3-origin (tool_output, tool_error) — arrives via oracle's in-process emission path only. Oracle extracts L1 signals from /chat request body and feedback submissions, observes the MCP round-trip for L3 outputs, and calls oracle.apollo.observer.ingest in-process on both layers' behalf. A direct emit from L1 credentials or from cortex to any Apollo path is rejected in Phase-1 test fixtures (§Invariants 14).
  • HTTP ingest tests (secondary path): the POST /api/v1/apollo/observations endpoint continues to function for admin replay/seed and for out-of-process emitters. ApolloClient.emit POSTs the envelope and traceparent with an appropriate Bearer token (admin token for replay, user-forwarded token for out-of-process emitters); server returns 202 as soon as the envelope is enqueued on the in-process async queue; queue overflow increments apollo_ingest_queue_dropped_total and is visible on /stats (never silent); client retries on transient failures within the configured attempt budget, then surfaces apollo_ingest_post_failure_total on permanent failure; at-least-once redelivery is deduped on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC; background worker crashes move envelopes to the optional dead-letter JSONL path when retry budget exhausts; services over lag/staleness thresholds appear in degraded_emitters.
  • Memory tests: observations, artifacts, graph nodes, graph edges, and graph snapshots indices support CRUD via the axonis-core Elastic base class; embeddings generated on store; semantic recall via kNN composes with filters; expires_ts + delete_by_query maintenance task coarsens and purges correctly.
  • Graph update tests: extractors are deterministic on every observation; node/edge upserts are idempotent; EWMA short- and long-window weights update correctly; no artifacts are created on the deterministic path.
  • Synthesis tests: each event-driven trigger (L1 request, L3 output, admin chat, guidance miss, admin-initiated) invokes the LLM once; concurrent synthesis is bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT; duplicate triggers within a trace_id are coalesced to the latest observation; synthesis calls receive the correct subgraph and artifact context.
  • Graph-anchor drift check tests: LLM proposals consistent with graph state are committed; proposals that contradict strongly-weighted edges are flagged as DriftEvent and held for admin review; rate-of-new-nodes cap triggers drift flagging.
  • Guidance tests: intent → artifacts matching; layer filtering; caller-permission filtering (guardrails); empty-result fallback when artifacts index is empty; 50 ms timeout on the hot path.
  • Evaluator tests: all four failure signals detected; L3 performance signals (1 and 2) weight heavier than user feedback and confidence; accelerated demotion (N=2) fires on L3-dominant score drops; upstream artifact re-flag cascade reaches IntentPattern / PromptShim / SpecFragment; repeated L3 failures escalate to DriftEvent rather than silent demotion; per-signal score decomposition is preserved in audit records.
  • Curator tests: each allowed action (promote, demote, forget, edit, rollback, compact); every disallowed action is refused (auth changes, guardrail changes, user-data access); audit record written for every mutation with actor, trigger, before/after version; curator-pause blocks all Curator mutations.
  • Versioning tests: artifact mutation copies prior version to apollo_artifact_history before overwrite; rollback restores the target version and creates a new version whose prev_version_id points at the post-rollback state; rollback event itself appears in audit; graph snapshots restore correctly; structural graph mutations by admin are audited.
  • Admin chat tests: role gating (admin only); each chat tool executes correctly; audit log shows actor: "admin:<username>"; indefinite: true flag works for critical actions.
  • Layer 1 schema tests: best-effort mode accepts traces without intent_schema and produces inferred nodes; required mode (APOLLO_REQUIRE_INTENT_SCHEMA=true) rejects schema-less traces with 400; intent_schema_coverage stat reports correct rolling percentage.
  • LLM swap tests: provider swap via env (APOLLO_LLM_PROVIDER / APOLLO_LLM_MODEL) takes effect without code changes; MiniMax-via-OpenAI-compatible endpoint is exercised; no MiniMax-specific strings leak into prompt or response parsers.
  • Failure posture tests: admin GET /guidance times out cleanly on APOLLO_GUIDANCE_TIMEOUT_MS; APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS overshoot on the attach path causes apollo_guidance to be omitted from the response/dispatch and increments apollo_guidance_attach_timeout_total without failing the user request; ingest queue overflow returns 202 with apollo_ingest_queue_dropped_total increment (never silent); Apollo module fails to import → oracle serves /chat and dispatches without apollo_guidance, ingest returns 503; Apollo hallucinates → evaluator demotes → next attached payload reflects the demotion → admin can force-rollback.
  • Injection channel tests: oracle attaches apollo_guidance to every /chat response body when an applicable artifact set exists for the caller's L1 scope; oracle attaches apollo_guidance to every MCP dispatch bound for an agent-kind L3 service; dispatches bound for library-kind services do not carry apollo_guidance; attached payload contains as_of, artifacts, and rationale_summary (no injection_id, trigger, or evidence_ref — those are audit-only); attach-timeout overshoot (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS) causes omission of apollo_guidance with an apollo_guidance_attach_timeout_total increment, not a request failure; Curator pause freezes the attached state (subscribers keep receiving the as-of-pause payload); L1 and L3 make no calls to Apollo endpoints in any test fixture — attempts from non-admin tokens to admin preview endpoints return 403.
  • component_kind tests: ServiceRegistry records carry a component_kind field (agent | library); oracle attaches apollo_guidance only to MCP dispatches bound for agent-kind services; library-kind services emit observations but receive no apollo_guidance in their dispatches; Evaluator signal 2 (schema_mismatch) fires only for agent-emitted tool_output with an L1 intent schema on the trace, and is skipped for library-emitted events; re-registering a service with a changed component_kind takes effect on the next dispatch without Apollo redeploy.
  • ApolloGuidanceCache SDK tests: cache.update(payload) replaces the full artifact set idempotently; each canonical accessor (get_system_prompt_additions, get_spec_fragments, get_tool_description_overrides, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints) returns correctly-ordered (weight desc, recency desc) results; applicability filtering narrows by intent context; empty-cache fallback returns empty lists / None without blocking; the SDK holds no transport, no HTTP client, no auth state — it is a pure in-process data structure.
  • Rationale + evidence tests: every attached apollo_guidance.artifacts[*] entry carries a non-empty rationale; every apollo_audit record carries a non-empty rationale and evidence_ref; LLM-driven actions produce synthesized rationales, deterministic Evaluator-driven actions produce templated rationales composed from score_decomposition; rationale and admin_note are distinct and both queryable; admin chat explain_decision(trace_id | artifact_id | audit_id) retrieves the stored rationale and resolves evidence_ref pointers to their underlying observations / graph snapshot / score decomposition; discuss_decision(artifact_id | audit_id) opens a chat thread with the rationale pre-loaded and permits inline action tool calls that themselves are audited with actor: "admin:<username>".
  • Trace propagation tests: L1-minted traceparent arrives unchanged at oracle; oracle forwards the header unchanged on downstream MCP and REST dispatches via axonis_core.gateway.client.extract_http_headers(); MCP context field carries traceparent end-to-end; ApolloClient stamps both the header and envelope trace_id on every POST /observations; oracle mints a replacement and logs missing_traceparent when the header is absent (best-effort); oracle returns 400 when APOLLO_REQUIRE_TRACEPARENT=true and the header is absent or malformed; envelope trace_id wins when it differs from the header; a full lineage query returns every event for a single trace_id across all emitting layers.
  • Integration tests: full lineage from Layer 1 intent_schema + user_prompt through oracle llm_turn and Layer 3 tool_output / tool_error to final_response, with observations captured at every boundary and artifacts produced by synthesis reflecting the lineage.

Design Decisions

All 19 design decisions are now locked. Summary for reviewers:

  • Q1 (observation cadence): locked — turn + tool + error + final response, no tokens
  • Q2 (learning cadence): locked — LLM-driven primary synthesis, event-driven (triggers: Layer 1 request ingested, Layer 3 output ingested, admin chat turn, guidance miss, admin-initiated synthesis). Decision graphs update deterministically per-observation as the supplemental grounding layer. Graphs anchor the LLM: every LLM output is checked against graph state, proposals that diverge from recorded reality are flagged as drift and held for review. No timed or batched synthesis.
  • Q3 (Apollo's LLM): locked — pluggable by design. Current default MiniMax M2.7 via APOLLO_LLM_PROVIDER=minimax / APOLLO_LLM_MODEL=m2.7. Must be swappable with any newer/stronger model by env change alone; no MiniMax-specific assumptions in prompts or parsers. Budget tracked separately from user-facing chat. Apollo's LLM is independent of oracle's user-facing chat LLM (oracle/server/llm/tool_executor.py, configured via the existing 5-provider gateway). The two surfaces are distinct: /api/v1/chat runs oracle's chat LLM with Apollo guidance applied via the L2 in-process subscriber path (§L2 path); /api/v1/apollo/chat runs Apollo's MiniMax for admin synthesis/conversation with Apollo itself.
  • Q4 (ingest back-pressure): locked — HTTP POST to /api/v1/apollo/observations with a bounded client-side retry (default 2 attempts, exponential backoff + jitter). The server-side handler enqueues onto an in-process asyncio queue (capacity APOLLO_INGEST_QUEUE_MAXSIZE, default 10000) and returns 202 immediately. A pool of background workers drains the queue asynchronously. Queue overflow is counted (apollo_ingest_queue_dropped_total{service}) and surfaced on /stats, never silent. Client POST timeout (APOLLO_INGEST_POST_TIMEOUT_SEC, default 30) is generous because the server-side operation is a local memory write, not WAN I/O. Lifecycle flush on subscriber shutdown ensures short-task libraries don't drop their final batch. See §Ingest Semantics.
  • Q5 (retention): locked — 30 days raw observations; 90 days graph snapshots tiered (7d hourly → 30d daily → 90d weekly); artifacts indefinite; audit log ≥ 90 days. Expiry is application-managed via expires_ts + delete_by_query, matching axonis-core / rest/uds/ convention (no Elastic ILM, no data streams, no rollovers). Mapping files live in oracle/apollo/templates/*_mapping.json alongside the pattern from rest/uds/templates/.
  • Q6 (spec staging): locked — single component.oracle.apollo with phases marked inline (§Phased Rollout). Matches the structure of platform.axonis-core/02/03. The LLM and graph anchor are a closed system; splitting would force cross-spec forward references.
  • Q7 (starter services): locked — Phase 1 emitters: oracle + cortex. Oracle is the sole emitter — oracle observes the /chat envelope (L1) and the MCP round-trip to cortex (L3) and calls the in-process observer directly. Cortex carries no Apollo emission code beyond the component_kind declaration on its ServiceRegistry record. Phase 1 subscriber: cortex (L3) — consumes apollo_guidance via ApolloGuidanceCache per Q20. Beacon (L1) is deferred: beacon has no HTTP connection to oracle today (its MCP_SERVER_URL defaults to cortex direct), so guidance has no path into beacon's process until the beacon↔oracle connection is designed. Parallax (L3) is deferred from Phase 1 to a follow-on phase; its emitter and subscriber wiring follow the same pattern as cortex when it onboards. Remaining services tracked in §Integration Backlog; each onboards via whichever §Ingest Semantics path fits (in-process relay when oracle MCP-dispatches to it — the default — or ApolloClient POST when it does not).
  • Q8 (existing memory modules): lockedcoexist. Apollo is additive; oracle's existing memory modules (conversation.py, cross_service.py, models/memory.py) remain untouched throughout Apollo's rollout. Absorption/deprecation recommendation captured in §Deferred: Consolidation of Oracle Memory Modules, flagged as not scheduled.
  • Q9 (failure signals): locked — all four signals feed the Evaluator (L3 error, L3 schema mismatch, user feedback, evaluator confidence), weighted. Layer 3 performance carries an amplified penalty: signals 1 and 2 use a heavier weight tier (default 3.0 vs 1.5 for user feedback, 0.5 for confidence), trigger accelerated demotion (N=2 cycles instead of N=5), flag upstream artifacts (IntentPattern, PromptShim, SpecFragment) for LLM review on next synthesis, and escalate to DriftEvent on repeated failures. Rationale: poor L3 performance indicates workflow generation and artifacts need updating, not a slow drift.
  • Q10 (audit log storage): locked — Elastic apollo_audit index (flat, UDS-backed, expires_ts + delete_by_query, per §Retention). Default 90-day retention configurable via APOLLO_AUDIT_RETENTION_DAYS. Records can be marked indefinite: true for critical admin actions (forget, pause/resume, rollback) — null expires_ts, never deleted. Schema captures action, actor, trigger, before/after versions, full per-signal score decomposition, upstream-artifact flags, and optional admin notes. Queryable via GET /api/v1/apollo/audit with rich filters.
  • Q11 (artifact versioning): locked — two-tier model, in place from Phase 1. Artifacts versioned per-mutation: current version in apollo_artifacts, prior versions copied to apollo_artifact_history (no expires_ts, indefinite). Every artifact carries version, prev_version_id, change_reason, actor. Rollback via POST /api/v1/apollo/artifacts/{id}/rollback, itself a versioned + audited event. Graphs use snapshot-based rollback instead of per-mutation versioning (hourly/daily/weekly tiers per Q5). Structural graph mutations by admin or Curator are audited.
  • Q12 (layer 1 obligation): locked — best-effort in Phase 1 and Phase 2 (default). Apollo accepts traces without intent_schema; extractors fall back to prompt-inference and mark inferred nodes with reduced weight. Signal 2 (schema_mismatch) is dark for schema-less traces; L3-performance penalty still fires on signal 1. GET /apollo/stats exposes intent_schema_coverage so admins can see when coverage is high enough. Promote to required via APOLLO_REQUIRE_INTENT_SCHEMA=true (config flip, no code change) — natural at Phase 3 when Curator empowerment goes live.
  • Q13 (guidance delivery): locked — symmetric across all three LLM tiers, with the transport appropriate to each layer. L1: response-attached on /api/v1/chat (beacon reads apollo_guidance from the response body and updates its local cache). L2: in-process — oracle's chat LLM (oracle/server/llm/tool_executor.py) consults a process-local ApolloGuidanceCache before each turn; no transport needed since oracle hosts Apollo. L3: MCP-arg-attached on every outbound tool dispatch bound for an agent-kind service (the agent pops the field, updates its request-scoped cache, and dispatches). No long-lived connections, no service-token infrastructure — guidance rides the ambient auth of the envelope it is embedded in (or the in-process call for L2). Attach budget is bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms); overshoot omits the field/call without failing the request. Admin inspection preserved via GET /api/v1/apollo/guidance?scope=... (admin-only).
  • Q14 (L3 taxonomy): locked — L3 is not homogeneous. Every L3 service declares component_kind on its ServiceRegistry record: agent (has an LLM; subscribes to injections; subject to schema_mismatch signal) or library (no LLM; emits observations only; not subscribed; not schema-evaluated). Apollo filters subscriber enumeration and Evaluator signal-2 application by this field. ServiceRegistry is the single source of truth; Apollo introduces no parallel registry. Misclassification is recoverable by re-registering the service — no Apollo code change or redeploy needed.
  • Q15 (application contract): lockedApolloGuidanceCache is a pure-stdlib in-process data structure with an update(apollo_guidance_block) sink (called by the subscriber's request handler when an inbound envelope carries apollo_guidance) and a fixed set of canonical accessors (get_system_prompt_additions, get_spec_fragments, get_tool_description_overrides, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints). Artifacts are ordered by (weight desc, recency desc); merge policy past ordering is the agent's choice. No transport, no HTTP client, no auth state inside the module — the file imports only typing and __future__. Distribution model: the canonical reference lives in axonis-core (axonis/apollo/guidance_cache.py). Subscribers SHOULD import directly from axonis-core (from axonis.apollo.guidance_cache import ApolloGuidanceCache) — this is oracle's and cortex's path. Vendoring is allowed but not preferred: if a future subscriber's dependency posture rules out taking on axonis-core (a different language runtime, a strict-isolation deployment, etc.), it MAY vendor the module under its own namespace; vendored copies must preserve the canonical contract verbatim and the subscriber owns drift detection. Cortex briefly vendored during M14 development, then unified on the canonical import once axonis-core was added to its pyproject.toml. Standardizes cross-service application by giving every L1/L3 agent the same SDK shape; only the import path varies.
  • Q16 (rationale + audit conversation): locked — every Curator action writes an apollo_audit record with a rationale (LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions) and evidence_ref (observations, graph snapshot id, score decomposition, related drift events). The per-artifact rationale also travels with each artifact on response/dispatch-attached payloads so agents may log it when applying; evidence_ref stays in the audit record only to keep the on-wire payload small. Admin chat exposes explain_decision(trace_id | artifact_id) and discuss_decision(artifact_id) so admins can conversationally review Apollo's reasoning and act on findings inline. rationale is why Apollo acted; admin_note (optional) is why the admin acted. Both are preserved in the audit index.
  • Q17 (trace propagation): lockedW3C Trace Context (traceparent header) end-to-end across L1 → L2 → L3. L1 mints; oracle forwards unchanged as an HTTP header on every downstream MCP / REST dispatch via an additive extension to axonis_core.gateway.client.extract_http_headers(). ApolloClient stamps both the header and the envelope trace_id on every observation POST (envelope wins on conflict). Greenfield: no existing tracing header in axonis-core or platform.axonis-core/02/03 is displaced. APOLLO_REQUIRE_TRACEPARENT=false through Phases 1–2 (oracle mints on absence, logs missing_traceparent); flip to true in Phase 3 alongside APOLLO_REQUIRE_INTENT_SCHEMA. No OpenTelemetry SDK dependency introduced; a future OTel integration consumes the same header without change. Realizes the OpenTelemetry aspiration noted in component.oracle.gateway.
  • Q18 (L1 + L3 emission path): lockedL1 → Oracle → Apollo and L3 → Oracle → Apollo. Neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter for every Phase-1 event type: intent_schema, user_prompt, and user_feedback are emitted by oracle in-process when oracle receives the corresponding signal from L1 via its existing API surface (/chat body, feedback submission); tool_output and tool_error are emitted by oracle in-process when oracle's MCP dispatch to an L3 service returns. L1 and L3 hold no Apollo endpoint knowledge and no Apollo credentials; they do not participate in Apollo ingest on any in-production path. ApolloClient is used only by admin replay/seed and by future out-of-process emitters outside oracle's MCP dispatch reach (§Ingest Semantics → Secondary path); Phase-1 emitters call oracle.apollo.observer.ingest in-process. Rationale: preserves component.oracle.gateway invariant 1 (oracle is the only externally visible service), keeps the L1 and L3 API surfaces narrow, and avoids requiring either layer to know about Apollo at all.
  • Q19 (ingest auth): locked — the primary in-process path has no network-auth boundary: oracle authenticates the user at the /chat / MCP ingress via OAuthMiddleware and then calls the Apollo observer as a direct in-process function call, so no additional token is required on emission. The secondary HTTP path (admin replay + out-of-process emitters) authenticates via a Bearer token — an admin token for replay/seed, or a user-forwarded token for an out-of-process emitter (the same token the service would have received on its inbound request). No Apollo-specific credential is issued; no new service-token mechanism is introduced. Background ingest without a user context (batch workers, standalone pipelines) is deferred pending component.oracle.gateway's Keycloak client-credentials work. See §Authentication & Authorization.
  • Q20 (subscriber consumption contract): locked — every subscriber that runs an LLM consumes apollo_guidance by instantiating an ApolloGuidanceCache (axonis-core), calling cache.update(payload) on every inbound envelope that carries the field, and reading the canonical accessors before its next LLM call. Mandatory accessors per LLM call: get_system_prompt_additions(intent_context) (each returned PromptShim's content.text is appended to the system prompt) and get_tool_description_overrides(tool_name) (applied per tool while rendering the tool catalog). Optional accessors (consumed where the agent's domain warrants): get_spec_fragments, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints. Cache lifetime is layer-dependent: session-scoped at L1 (one cache per chat session, kept across turns, refreshed each time oracle's /chat returns a new payload) and request-scoped at L3 (one cache per inbound MCP tool call, populated from arguments.apollo_guidance, discarded after the tool returns). Failure posture: missing or malformed apollo_guidance is a no-op — cache.update(None) is legal, accessors return empty lists / None, and the LLM call proceeds with no guidance applied. Phase 1 subscriber: cortex (L3). Beacon's L1 wiring is deferred — beacon currently has no HTTP connection to oracle (its MCP_SERVER_URL defaults to cortex direct, not oracle), so there is no path through which apollo_guidance can reach beacon's process today. A separate spec decision must define how beacon becomes an oracle client before the L1 contract can be exercised. Parallax's L3 subscriber wiring is deferred to a later phase (same pattern as cortex when it lands). Rationale: pins the consumption contract that Q15 left open ("merge policy past ordering is the agent's choice"), so subscriber tests can prove guidance actually changes a downstream LLM call rather than being attached and discarded.

Implementation Plan (milestone history)

Companion to: component.oracle.apollo-APOLLO.md (design) Scope: a step-by-step build order for the Apollo package inside oracle. Each milestone is a small, reversible, independently shippable slice. Later milestones layer onto earlier ones; no milestone depends on work that appears later. Audience: the engineer(s) implementing Apollo. This document answers "what do I build first, and in what order."

Principles

  1. Ship in small reversible slices. Every milestone ends with a mergeable PR that leaves oracle functional whether or not later milestones land.
  2. Stand up the skeleton before the brain. Observation intake and deterministic grounding come before LLM synthesis; the system must be able to record reality before it tries to reason about reality.
  3. Oracle stays working throughout. Apollo is additive. Do not modify oracle's existing routes, middleware, memory modules, or chat behavior except where explicitly scoped (ChatResponse extension, MCP dispatch argument injection, in-process observer calls).
  4. Follow established axonis conventions without exception. Pydantic for envelopes, Memory(UDS) for Elastic-backed classes, axonis_core.elastic.Elastic for CRUD, OAuthMiddleware for auth, HTTP + Bearer for transport, JSON templates under templates/ for index mappings. If a new pattern is tempting, stop and find the existing one first.
  5. Every milestone ends with tests. The design spec's §Test Expectations is the canonical test list; implement the subset relevant to each milestone as you go. Do not defer tests to the end.
  6. Observability from day one. /stats endpoint exists from Milestone 1; every metric named in the design spec has a zero-valued counter registered at startup, so dashboards can be built before the code that drives them.

Milestone map

# Milestone Why it comes here
0 Package scaffolding + settings + dependencies Nothing else compiles without this.
1 Observation intake (HTTP POST + async queue + Elastic writes) Apollo needs to receive and persist raw observations before anything downstream can work.
2 Deterministic graph updates (no LLM) The grounding layer — lets Apollo record reality without interpreting it. Drift check in later milestones depends on this.
3 Guidance attach plumbing (empty artifact set OK) Wire apollo_guidance into /chat responses and MCP dispatches with an always-empty artifact set; proves end-to-end wiring without needing any learned artifacts.
4 Subscriber SDK (ApolloGuidanceCache in axonis-core) Agents need a canonical way to consume apollo_guidance. Ships independently of Apollo having anything to say.
5 Phase-1 emitter integration (oracle as sole observer for L1, L2, and L3 — all emission in-process) Enough real traffic to populate graphs. Cortex carries no Apollo emission code; oracle observes the MCP round-trip to it and emits tool_output / tool_error on its behalf. Parallax onboards in a follow-on phase.
6 Trace propagation (W3C traceparent end-to-end) Lineage stitching. Can land anytime after Milestone 1, but best coupled with emitter work so emitters adopt it on initial integration.
7 Admin inspection surface (CRUD endpoints + read-only chat) Operator visibility before autonomous behavior. Admin must be able to see Apollo's state before Apollo is allowed to change its state.
8 LLM synthesis engine + graph-anchor drift check Introduces Apollo's LLM loop. Produces artifact proposals; drift-check gate is in place from the first synthesis commit.
9 Curator commits + versioning + audit Turns proposals into committed state. Hard invariants enforced. Every mutation versioned and audited.
10 Evaluator scoring + L3-performance amplification Closes the feedback loop — artifacts that stop correlating with good outcomes decay.
11 Admin chat empowerment (action tools + rationale discussion) Full admin override: explain/discuss/rollback/forget. Requires Milestones 7–9 as substrate.
12 Autonomous Curator + production drift prevention Flip Curator to autonomous. Drift thresholds tuned. Pause/resume wired. Rollback endpoints live.
13 Maintenance + /stats polish + degraded-emitter reporting Hourly maintenance job, snapshot coarsening, full metric surface. The ops-readiness milestone.
14 Subscriber LLM consumption (cortex L3) Closes the L3 consumption side of the Injection Channel. M3–M5 attached apollo_guidance to MCP dispatches; M14 wires the SDK reads into cortex's tool path so guidance changes downstream LLM prompts at runtime. Locks Q20's contract for L3. Beacon (L1) is deferred — beacon has no HTTP connection to oracle today, so the L1 path is gated on a separate beacon↔oracle wiring decision.
15 Subscriber LLM consumption (oracle L2) Closes the L2 consumption side. Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop (oracle/server/llm/tool_executor.py). M15 wires Apollo's in-process for_l2(...) attach plus an ApolloGuidanceCache populated each turn so oracle's chat LLM consumes guidance the same way cortex does — no transport, since oracle hosts Apollo.

Milestones 0–6 correspond roughly to the design spec's Phase 1 ("Observe and ground"). Milestones 7–10 correspond to Phase 2 ("Synthesize and advise"). M14 + M15 retroactively complete Phase 2's Injection Channel commitment ("guidance reaches every LLM") on the consumption side; oracle's attach side has been live since M3. Milestones 11–13 correspond to Phase 3 ("Empower and maintain").


Milestone 0 — Package scaffolding, settings, dependencies

Purpose. Create the directory layout specified in §Package Structure of the design spec, wire Apollo into oracle's Starlette app as a mounted route prefix, and land the dependency additions.

Scope. - Create oracle/apollo/ tree per design §Package Structure (empty module files where needed). - Add oracle/apollo/settings.py exposing every APOLLO_* env var listed across the design spec, each with its default. All other code reads config from here, never from os.environ directly. - Extend oracle/pyproject.toml with sentence-transformers>=3.0.0 and numpy>=1.24. Confirm no other new dependencies. - Mount /api/v1/apollo/* route group in oracle/server/__main__.py. For Milestone 0, only mount GET /api/v1/apollo/stats returning {"status": "bootstrapping"} to prove wiring.

Files created. - oracle/apollo/__init__.py (stub) - oracle/apollo/settings.py - oracle/apollo/observer/__init__.py, events.py, ingest.py (stubs) - oracle/apollo/memory/__init__.py, store.py (stubs) - oracle/apollo/learner/__init__.py (stub) - oracle/apollo/guidance/__init__.py, api.py, attacher.py, selectors.py (stubs) - oracle/apollo/curator/__init__.py (stub) - oracle/apollo/evaluator/__init__.py (stub) - oracle/apollo/chat/__init__.py (stub) - oracle/apollo/artifacts.py (stub) - oracle/apollo/llm.py (stub) - oracle/apollo/templates/ (empty directory — mappings added Milestone 1)

Files modified. - oracle/pyproject.toml (dependencies) - oracle/server/__main__.py (mount route group)

Acceptance. - uv run python -m server starts without error. - GET /api/v1/apollo/stats returns 200 with a stub JSON body. - No existing oracle test fails.

Rollback. Delete the oracle/apollo/ directory and revert the two modified files.


Milestone 1 — Observation intake

Purpose. Stand up observation intake: the in-process oracle.apollo.observer.ingest.ingest(...) entry point (the primary path that oracle will use from Milestone 5 onward to emit on behalf of L1, L2, and L3), the POST /api/v1/apollo/observations endpoint (the secondary path for admin replay/seed and any future out-of-process emitter), the shared in-process asyncio queue, the background-worker pool, and Elastic writes to apollo_observations. At end of this milestone, Apollo can receive observations via either path and persist them — nothing more.

Scope. - oracle/apollo/observer/events.py — Pydantic models for every event type in design §Observation Model (intent_schema, user_prompt, llm_turn, tool_output, tool_error, final_response, user_feedback). All inherit from a common envelope. - oracle/apollo/observer/ingest.py: - async def ingest(envelope) — in-process entry point; validates and enqueues. This is what oracle calls for every Phase-1 event it observes — L1-origin (intent_schema, user_prompt, user_feedback), L2-origin (llm_turn, final_response), and L3-origin (tool_output, tool_error) observed via oracle's MCP round-trip (oracle is Apollo's sole emitter per M5 + §Invariants 14). - async def _drain_worker() — background coroutine that reads from the queue and writes to apollo_observations via apollo.memory.store. Worker pool size = APOLLO_INGEST_WORKER_CONCURRENCY. - At-least-once dedup on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC. - oracle/apollo/guidance/api.py — add POST /observations route. Auth: the existing OAuthMiddleware + require_auth dependency. Request body is {"observations": [envelope, ...]}; response 202 with {"accepted": N}. - oracle/apollo/memory/store.pyApolloObservation(Memory(UDS)) class backed by the apollo_observations index. - oracle/apollo/templates/apollo_observations_mapping.json — index template matching the §Index mappings convention (UDS block, create_ts, update_ts, schema_version, expires_ts, embedding field). - Register zero-valued counters on startup: apollo_ingest_accepted_total, apollo_ingest_queue_dropped_total, apollo_ingest_post_failure_total, apollo_ingest_queue_depth.

ApolloClient addition (axonis-core). - New file axonis_core/gateway/apollo_client.py — thin httpx.AsyncClient wrapper with emit(envelope) method. Client-side batching (APOLLO_INGEST_BATCH_SIZE, APOLLO_INGEST_FLUSH_INTERVAL_MS), bounded retries, lifecycle flush() on atexit. No Redis, no ML deps, no new top-level dependency — uses httpx which axonis-core already has. This client targets the secondary ingest path only — admin replay/seed and future out-of-process emitters. Phase-1 emitters (oracle + cortex) never import it; oracle emits in-process (see Milestone 5).

Acceptance. - Unit: envelope validation catches malformed events; dedup window suppresses duplicates. - Integration: a POST with 50 observations returns 202 in <10 ms; all 50 land in apollo_observations; queue-full case returns 202 with apollo_ingest_queue_dropped_total incremented; worker-crash case puts envelope on retry then to APOLLO_INGEST_DEAD_LETTER_PATH if set. - /stats exposes queue depth and per-service apollo_ingest_last_ingest_ts — a single timestamp covering both oracle's in-process enqueues (primary path for Phase-1 emitters) and secondary-path POSTs.

Rollback. Revert the route addition and the ApolloClient file. Observations stop being accepted; oracle unaffected.


Milestone 2 — Deterministic graph updates

Purpose. Wire the five Decision Graphs (§Learner → Decision Graphs) so that every ingested observation produces graph mutations deterministically, with no LLM call. This is the grounding layer.

Scope. - oracle/apollo/learner/extractors.py — rule-based extractors that map observations to (graph_id, nodes_touched, edges_touched, outcome_class). Five specialized extractor paths, one per graph. - oracle/apollo/learner/graphs.pyDecisionGraph class wrapping the two Elastic indices (apollo_graph_nodes, apollo_graph_edges). Upsert operations with EWMA weight updates (short-window via APOLLO_GRAPH_EWMA_SHORT, long-window via APOLLO_GRAPH_EWMA_LONG). - oracle/apollo/learner/snapshots.py — periodic snapshot task (default hourly, APOLLO_GRAPH_SNAPSHOT_INTERVAL) writes current state to apollo_graph_snapshots. - oracle/apollo/learner/trajectory.py — EWMA-based projection of near-future graph state. - Extend oracle/apollo/observer/ingest.py background worker: after Elastic write of an observation, call into extractors and apply graph mutations. - In-memory mirror of active graphs for hot-path reads; rebuilt from Elastic on startup. - Templates: apollo_graph_nodes_mapping.json, apollo_graph_edges_mapping.json, apollo_graph_snapshots_mapping.json.

Acceptance. - Unit: deterministic extractors are idempotent; running the same observation twice produces identical graph state. EWMA math matches expected values for a known sequence of observations. - Integration: posting 1000 synthetic observations populates nodes/edges; hourly snapshot task runs and writes to apollo_graph_snapshots; restarting Apollo rebuilds the in-memory mirror from Elastic. - No LLM call happens on this path.

Rollback. Disable the extractor invocation in the background worker (feature flag on settings); graphs stop receiving updates but existing state is preserved.


Milestone 3 — Guidance attach plumbing (empty artifact set)

Purpose. Wire apollo_guidance into both delivery paths (§Injection Channel) end-to-end, returning an empty-but-well-formed payload. No artifacts yet; this milestone proves the attach mechanism without requiring the synthesis engine.

The L1 attach side wires into oracle/server/api/routes.py chat() handler (POST /api/v1/chat) — oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop. This is distinct from POST /api/v1/apollo/chat (Apollo's admin chat at oracle/apollo/chat/server.py:79), which runs Apollo's separate MiniMax LLM and is not the L1 surface.

Scope. - oracle/apollo/guidance/attacher.py: - def for_l1(user, intent_context) -> dict | None — in-process; returns {"as_of": ..., "artifacts": [], "rationale_summary": ""} or None based on settings. Bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS. - def for_l3_agent(service_name, intent_context, tool_name) -> dict | None — same shape. - oracle/apollo/artifacts.py — Pydantic models for every artifact type in §Artifact types, used in the payload schema even though no artifacts exist yet. - Oracle ChatResponse extension in oracle/server/api/routes.py: add apollo_guidance: dict | None = Field(default=None). /chat handler calls attacher.for_l1(...) before constructing the response. - Oracle MCP dispatcher extension in oracle/server/mcp/server.py and oracle/server/llm/tool_executor.py: before serializing an outbound MCP tool call to an agent-kind L3 service, call attacher.for_l3_agent(...) and inject the result into the tool's arguments dict under apollo_guidance (same pattern as the existing llm_spec injection). Library-kind services are excluded from this injection. - component_kind field on ServiceRegistry (oracle/server/mcp/registry.py): add an agent | library field to the ToolInfo / registry record, sourced from the service's GET /service-info response or /register POST body. Default to agent if absent (safe default — unknown services treated as agents; oracle will attach guidance they ignore). - oracle/apollo/guidance/selectors.pymatch_artifacts(intent_context, active_set) -> list[Artifact]. Runs empty today; returns empty list because active_set is empty. Implementation exists so Milestone 8 only needs to feed it a non-empty set. - Timeout handling: if attacher.for_l1 / for_l3_agent overshoots the budget, return None; oracle proceeds without attaching. Counter apollo_guidance_attach_timeout_total increments.

Acceptance. - Integration: /chat response body includes apollo_guidance: {"as_of": ..., "artifacts": [], "rationale_summary": ""} when a caller is identified; null when Apollo is disabled via settings. - Integration: MCP tool-call dispatch to an agent-kind service carries apollo_guidance inside arguments; dispatch to a library-kind service does not. - Integration: with APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS=1, field is omitted; request still succeeds. - No regression in existing oracle tests.

Rollback. Gate the attacher calls behind APOLLO_GUIDANCE_ATTACH_ENABLED, default false until all of Milestone 3 is merged.


Milestone 4 — Subscriber SDK (ApolloGuidanceCache)

Purpose. Ship the canonical in-process cache that every L1 and L3 agent uses to consume attached apollo_guidance payloads.

Scope. - New module inside oracle at apollo/sdk/guidance_cache.py. Axonis-core is not touched at this milestone. The SDK is published under apollo.sdk as the canonical source; the file later relocates into axonis_core/apollo/ (the canonical reference per Q15). Subscribers SHOULD import it directly from axonis-core (oracle's path; cortex's path in M14). Vendoring under a subscriber's own namespace remains documented in Q15 as a fallback for any future agent whose dep posture rules out taking on axonis-core, but Phase 1 subscribers all import canonical. - API: update(apollo_guidance_block) sink + six canonical accessors (design §Subscriber SDK table). - Ordering: (weight desc, recency desc) when multiple artifacts match. - Applicability filtering inside the cache. - Empty-cache fallback: accessors return empty lists / None without blocking. - Pure Python data structures; no HTTP client, no transport, no ML dependency. No Apollo imports — the SDK is deliberately decoupled from apollo.artifacts so the M5 file move lands cleanly.

Acceptance. - Unit: cache.update() replaces the full artifact set idempotently; update(None) is a no-op (preserves prior cache); accessors filter by intent context; empty cache returns empty results; ordering is stable. - No new dependency added to axonis-core.

Rollback. Axonis-core accepts a new file; removing it breaks no existing imports (no one depends on it until Milestone 5).


Milestone 5 — Phase-1 emitter integration

Purpose. Wire oracle as the sole observer for Phase-1. Oracle emits on behalf of L1, L2, and L3 via in-process oracle.apollo.observer.ingest calls — no HTTP, no cross-process emission from cortex. Oracle attaches apollo_guidance on every outbound /chat response (L1) and every MCP dispatch bound for an agent-kind L3 service (cortex). The L3 consumption of that attached guidance is wired in M14 (cortex); the L1 consumption side is deferred until a beacon↔oracle connection is designed (see M14 §Out of scope). Through M5–M13 the attach side is live but no subscriber yet reads the field. The observation side is oracle-only.

This milestone operationalizes component.oracle.apollo §Invariants 14 and §Ingest Semantics: neither L1 nor L3 addresses Apollo directly. Both talk to oracle; oracle talks to Apollo.

Scope. - Oracle — L1 and L2 emissions (unchanged from prior plan). - In oracle/server/api/routes.py: emit intent_schema (from /chat body if present), user_prompt (from /chat body), final_response (on serialize), user_feedback (on a new feedback endpoint or extended existing one). - In oracle/server/llm/router.py / tool_executor.py: emit llm_turn on every LLM request/response cycle inside oracle. - Oracle — L3 emissions (new; replaces per-service emission). - Add emit_tool_output and emit_tool_error helpers to oracle/apollo/hooks/chat.py, matching the existing emit_user_prompt / emit_llm_turn / emit_final_response shape. Helpers take the caller identity, trace_id, conversation_id, service name, tool name, latency, and the tool input/output (or error) — and call oracle.apollo.observer.ingest.ingest(...) in-process. - In oracle/server/llm/tool_executor.py: after _call_backend_tool returns (success or raised error), call emit_tool_output or emit_tool_error with the observed result. This is the emission point for every tool dispatch oracle's LLM-use loop makes. - In oracle/server/mcp/server.py (MCP proxy path): after the proxied dispatch completes, emit the same way — covers MCP tool calls that arrive from external MCP clients and are forwarded to L3 services through oracle. - All emissions flow through apollo.observer.ingest.ingest(...) — no HTTP, no ApolloClient. - Cortex — no Apollo emission code at this milestone. - Ensure cortex's GET /service-info response (or equivalent registration payload) declares "component_kind": "agent" so oracle attaches apollo_guidance on MCP dispatches to it and so Evaluator signal 2 applies. - At M5, cortex receives apollo_guidance in MCP arguments but does not yet read it — its tool signatures don't declare the field, so FastMCP silently strips it before invocation. Subscriber consumption (cache.update + accessor reads) is wired in M14. Through M5–M13, attach is live and consumption is dark. - Parallax — deferred from Phase 1. Same observer + subscriber pattern as cortex when it onboards. - Secondary path (POST /api/v1/apollo/observations). Remains mounted from Milestone 1 for admin replay/seed and for future out-of-process emitters; not exercised by Phase-1 services.

Acceptance. - Integration: a user /chat request produces a full lineage of observations under a single trace_id — all emitted by oracle in-process. The lineage includes user_prompt, oracle llm_turn, tool_output / tool_error for every MCP dispatch oracle made to cortex, and final_response. - Oracle attaches apollo_guidance to its /chat response and to every outbound MCP dispatch bound for cortex. Subscriber consumption is dark at M5; M14 closes that loop. - Cortex's source tree contains zero imports of ApolloClient or ApolloIntegration and zero lifespan wiring for Apollo emission. The only Apollo-facing change in cortex at M5 is the component_kind field in its registration payload. - Full lineage query (Milestone 7) returns every event for the trace.

Rollback. The in-process emission helpers are guarded by the existing APOLLO_EMITTER_ENABLED flag in oracle/apollo/settings.py; flipping it to false disables all oracle-side emission without touching oracle's request path. The secondary HTTP path remains mounted regardless.


Milestone 6 — Trace propagation (W3C traceparent)

Purpose. End-to-end trace stitching via the W3C traceparent header. Every observation oracle emits on behalf of L1, L2, or L3 for a single /chat request carries the same trace_id; every outbound call oracle makes to L3 forwards the same traceparent header so L3's own logs can correlate.

This milestone operationalizes component.oracle.apollo §Trace Propagation under the oracle-sole-observer rule from M5. The wire path is L1 → Oracle → L3 — Apollo is never on the wire. Oracle's in-process emitters stamp the envelope trace_id directly from the ambient value. Outbound calls to L3 carry the header only for L3's own logging correlation; L3 never forwards it to Apollo because L3 never addresses Apollo (§Invariants 14).

Scope. - Canonical trace module (axonis-core, strictly additive). New axonis/core/trace.py: - TraceContext + parse_traceparent / format_traceparent / mint_traceparent — W3C-conformant parser with strict validation (reserved all-zero trace-ids rejected, unknown version byte rejected). - Ambient ContextVar holding the raw 4-segment string; set_current_traceparent, get_current_traceparent, current_trace_id() accessors. - No new dependency — pure Python, ~20 lines of parsing. - Ingress: oracle mints on receipt. New oracle/server/middleware/trace.py::TraceparentMiddleware, installed outside OAuthMiddleware in oracle/server/__main__.py: - Reads APOLLO_TRACE_HEADER (default traceparent) on every non-skip request; skip paths are /health and /service-info. - Parses via the canonical module. On valid: installs on the ContextVar unchanged (oracle never re-mints mid-request). On missing: mints a replacement + increments apollo_missing_traceparent_total. On malformed: same behavior + increments apollo_malformed_traceparent_total. - APOLLO_REQUIRE_TRACEPARENT=true (flipped in Phase 3) short-circuits both failure paths with a 400 response. - Propagation helper. Extend axonis_core.gateway.client.extract_http_headers() to forward traceparent alongside Authorization. Pulls from the inbound headers when provided; otherwise falls back to the ambient ContextVar. This is the single source of truth for any gateway client — MCPClient, RestClient, and the federation layer inherit traceparent forwarding for free without being touched directly. - Outbound: oracle's direct httpx paths. Oracle's L3-dispatch paths use httpx.AsyncClient directly rather than the gateway clients, so they inject traceparent explicitly: - oracle/server/llm/tool_executor.py::_call_backend_tool — adds the ambient traceparent to outbound headers on every backend MCP POST. - oracle/server/mcp/server.py MCP proxy — same. - In-process emitters carry trace_id directly. apollo/hooks/chat.py helpers already accept a trace_id parameter; oracle/server/api/routes.py::/chat now sources it from current_trace_id() (the ambient value installed by TraceparentMiddleware) and threads it into every helper call. The MCP proxy path does the same, falling back to a locally-minted id only when invoked outside HTTP ingress (e.g., direct programmatic tests). No header is on the primary path — emissions are in-process function calls. - Secondary-path stamping. ApolloClient.emit() in axonis_core/core/apollo/client.py stamps traceparent from the ambient ContextVar on every POST; the envelope trace_id (caller-set) wins on conflict per §Envelope mapping. Phase-1 emitters (oracle + cortex) never use this client — admin replay and out-of-process emitters do. - Config. APOLLO_TRACE_HEADER (default traceparent) and APOLLO_REQUIRE_TRACEPARENT (default false through Phases 1–2) already exist in oracle/apollo/settings.py; no new env var surface. - L1 expectation. L1 (beacon, browser clients, any direct /chat caller) is expected to mint traceparent on every new request. Document this contract; do not enforce in best-effort mode (oracle mints on absence and serves the request).

Acceptance. - Integration: /chat → oracle → cortex → oracle. Every observation oracle emits — L1 user_prompt, L2 llm_turn, L3 tool_output / tool_error, L2 final_response — carries the same trace_id, which is the one installed by TraceparentMiddleware. Lineage query stitches them. - Outbound MCP dispatches from oracle to L3 include a traceparent header with the same trace_id — L3's own logs can correlate against oracle's observations even though L3 never talks to Apollo. - TraceparentMiddleware mints a replacement and increments apollo_missing_traceparent_total when the inbound header is absent; increments apollo_malformed_traceparent_total when the header is present but malformed (best-effort mode). Returns 400 on either condition in required mode. - ApolloClient.emit() stamps the ambient traceparent on secondary-path POSTs; envelope trace_id wins on header-vs-envelope conflict.

Rollback. Revert the TraceparentMiddleware installation in server/__main__.py, the extract_http_headers extension, the ApolloClient traceparent stamping, and the outbound header injections in tool_executor.py / mcp/server.py. axonis/core/trace.py stays (no import breaks if nothing calls it). Lineage stops stitching across services but every other pathway — ingest, emit helpers, guidance attachment — works unchanged.


Milestone 7 — Admin inspection surface

Purpose. Operators can see everything Apollo has captured before Apollo is allowed to mutate anything autonomously. Every endpoint in this milestone is admin-only — neither L1 nor L3 ever calls them (§Invariants 14). The only production-path traffic Apollo serves is the response-attached guidance payload and the secondary-path POST /observations; everything under /api/v1/apollo/memories|artifacts|guidance|audit|stats|chat in this milestone is admin inspection.

Scope. - REST endpoints (all admin-only): - GET /api/v1/apollo/memories, GET /memories/{id}, POST /memories (seed), PATCH /memories/{id}, DELETE /memories/{id} - GET /api/v1/apollo/artifacts — returns empty today; populated Milestone 8+ - GET /api/v1/apollo/guidance?scope=l1 and ?scope=l3:<service> — preview the currently-attachable set (empty today) - GET /api/v1/apollo/subscribers — list currently-connected admin SSE debug streams - GET /api/v1/apollo/guidance/stream?scope=<scope> — admin-only SSE debug feed (use cortex's event_stream.py as reference) - GET /api/v1/apollo/audit — returns empty today; populated Milestone 9 - GET /api/v1/apollo/stats — populated with all counters registered so far - oracle/apollo/chat/server.py — read-only admin chat (can inspect observations / lineage / graphs; cannot yet act). Uses Apollo's LLM (see Milestone 8) — or, until Milestone 8, a stub that only runs list_memories and get_memory tools. - oracle/apollo/chat/tools.py — read-only tool set for this milestone. - Auth: every endpoint gated by role admin via oracle's guardrails.

Acceptance. - Admin can query observations by trace_id, inspect graph state, and read /stats. - Non-admin callers receive 403 on every admin endpoint. - Admin SSE debug feed emits per Curator commit (no commits yet — but the wiring is proven with a synthetic emit).

Rollback. Remove the endpoint registrations; inspection is lost but no data is affected.


Milestone 8 — LLM synthesis + graph-anchor drift check

Purpose. Apollo's LLM runs. It proposes artifact mutations. The graph-anchor drift check gates every proposal. Nothing is committed autonomously yet — proposals go to a pending-review queue.

Scope. - oracle/apollo/llm.py — Apollo's LLM client. Three providers ship in M8: - openai (production default) — the existing openai SDK pointed at any OpenAI-compatible endpoint via APOLLO_LLM_BASE_URL (e.g., MiniMax's hosted endpoint). No new dependency. - minimax-local (scaffolded; dev / air-gapped) — lazy HuggingFace transformers load of the stock MiniMax checkpoint using the canonical model-card signature: python tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True) Weights are resolved from the standard HF cache at ${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/. See design spec §Apollo's LLM → Local MiniMax via HuggingFace for the full contract (pre-pull pattern, disk/GPU expectations, deferred production knobs like APOLLO_LLM_LOCAL_MODEL_PATH, device mapping, thread-pool offload, streaming). M8 ships the scaffold only — the deferred knobs are explicitly out of scope for this milestone and tracked under §Out of scope for this plan. - stub (tests + bootstrapping) — canned responses registered via LLMClient.install_stub_response(...); no network, no GPU, no model dep. Used throughout the test suite to drive synthesis deterministically.

The minimax alias is accepted as a synonym for openai (with MiniMax's endpoint as the assumed base URL) so APOLLO_LLM_PROVIDER=minimax stays meaningful; the in-code dispatch funnels minimax and openai through the same provider path. - oracle/apollo/learner/synthesis.py — event-driven dispatcher. Triggers fire off observations oracle has already ingested into the in-process queue (component.oracle.apollo §Invariants 14: oracle is the sole Phase-1 emitter, so "L1 observation ingested" means "oracle emitted an L1-origin envelope on L1's behalf"). Triggers (design §Synthesis triggers): - L1-origin observation ingested (intent_schema / user_prompt) — emitted by oracle from /chat request body - L3-origin observation ingested (tool_output / tool_error / final_response) — emitted by oracle after observing the MCP round-trip (or the /chat serialization point for final_response) - Admin chat turn - Admin-initiated synthesis via POST /api/v1/apollo/learn - Concurrency bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT; trace_id coalescing. - oracle/apollo/learner/prompts.py — prompt templates for synthesis (intent classification, failure-pattern extraction, etc.). - oracle/apollo/learner/drift.py — graph-anchor drift check. Every LLM output passes through four checks (proposed-pattern-vs-edges, intent-classification-vs-clusters, weight swings, trajectory coherence). Divergent proposals produce a DriftEvent and enter the pending-review queue; consistent proposals are approved for commit (commit itself happens in Milestone 9). - Write artifact proposals to a new apollo_artifact_proposals transient store — or directly to apollo_artifacts with a status: "pending_admin_review" | "approved" field. Pick one; the design spec does not mandate.

Acceptance. - Integration: a synthetic sequence of L3 tool_error observations triggers an LLM call; the LLM proposes a FailurePattern; graph-anchor check validates against the outcome_graph; if consistent, proposal is marked approved and ready for commit; if not, produces a DriftEvent. - Unit: drift-check unit tests cover each of the four checks. - LLM swap test: setting APOLLO_LLM_PROVIDER=openai and APOLLO_LLM_MODEL=gpt-4 changes provider with no code change.

Rollback. Disable the synthesis trigger entries; deterministic graph updates continue, no proposals generated.


Milestone 9 — Curator commits + versioning + audit

Purpose. Approved proposals become committed artifacts. Every mutation is versioned and audited. Hard invariants enforced.

Scope. - oracle/apollo/curator/actions.pypromote, demote, forget, edit, compact. Each wraps the mutation + history-write + audit-write as one atomic unit. - oracle/apollo/curator/policy.py — hard invariants (§Curator → Disallowed actions). Every action first passes through the policy gate. - oracle/apollo/curator/audit.py — writes apollo_audit records. Schema per design §Audit log: action, actor, trigger, artifact_id, before_version_id, after_version_id, evaluator_score, score_decomposition, upstream_artifact_ids, rationale (required, non-empty), evidence_ref, indefinite, admin_note. - Templates: apollo_artifacts_mapping.json, apollo_artifact_history_mapping.json, apollo_audit_mapping.json. - Rollback endpoint: POST /api/v1/apollo/artifacts/{id}/rollback with target version. Rollback is itself a versioned + audited event. - Admin-only write endpoints: POST /artifacts/{id}/promote, POST /artifacts/{id}/demote, DELETE /artifacts/{id}. - Rationale generation: LLM-synthesized for LLM-driven actions; templated from score decomposition for deterministic Evaluator actions (see Milestone 10). - At this milestone, autonomous Curator is disabled. Every mutation requires an admin trigger via the chat or admin endpoints. (Milestone 12 flips this to autonomous.)

Acceptance. - Integration: admin promotes an artifact; the prior version lands in apollo_artifact_history; current version in apollo_artifacts; an audit record with non-empty rationale lands in apollo_audit. - Integration: admin rolls back; rollback writes a new version whose prev_version_id points at the post-rollback state; rollback itself is audited. - Unit: every disallowed action raises CuratorPolicyViolation; no mutation occurs; no audit record is written.

Rollback. Disable the write endpoints; proposals sit in the pending queue without ever being committed.


Milestone 10 — Evaluator scoring + L3 performance amplification

Purpose. Close the feedback loop. Per-artifact rolling scores drop as outcomes degrade. L3 performance signals carry amplified weight.

Scope. - oracle/apollo/evaluator/signals.py — detectors for the four failure signals (L3 error, L3 schema mismatch, user feedback, evaluator confidence). Signal 2 gated on component_kind == "agent" of the observed service (the envelope's service field) — oracle is the actual emitter under oracle-sole-observer, but the component-kind contract keys on the L3 target oracle observed. Look up the kind from ServiceRegistry at signal-application time. - oracle/apollo/evaluator/scoring.py — rolling EMA per artifact. Weight tiers APOLLO_EVALUATOR_WEIGHT_L3_ERROR (3.0), APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH (3.0), APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK (1.5), APOLLO_EVALUATOR_WEIGHT_CONFIDENCE (0.5). - oracle/apollo/evaluator/cascade.py — when an L3-dominant score drop occurs, flag upstream IntentPattern / PromptShim / SpecFragment for review on the next synthesis trigger. - Demotion cadence: normal N=5 cycles, L3-dominant N=2 (APOLLO_EVALUATOR_L3_FAST_DEMOTE_N). The Curator reads these thresholds when recommending demote actions (still admin-triggered at this milestone). - Repeated L3 failures on the same artifact within a short window escalate to a DriftEvent rather than silent demotion. - Score decompositions preserved on every audit record so admins can see why a score moved.

Acceptance. - Integration: synthetic L3 errors on traces that used pshim_xyz drive the artifact's rolling score below threshold; after N=2 cycles the Curator recommends demotion (visible in admin chat); upstream artifacts are flagged for synthesis review. - Unit: weight math matches design-spec tiers; signal 2 is dark for library-emitted events.

Rollback. Disable Evaluator runs on the ingest worker; scores stop updating; existing scores preserved.


Milestone 11 — Admin chat empowerment

Purpose. Full conversational admin surface: explain / discuss / act. Admin's chain of reasoning is preserved in the audit log alongside Apollo's.

Scope. - Extend oracle/apollo/chat/tools.py with the full tool set (design §Admin Chat): - list_memories, get_memory, forget_memory - promote_artifact, demote_artifact, rollback_artifact, forget_artifact - rollback_graph, trigger_synthesis - explain_decision(trace_id | artifact_id | audit_id) — retrieves stored rationale and resolves evidence_ref - list_decisions, discuss_decision — conversational review of Curator actions - pause_curator, resume_curator - Admin actions via chat are audited with actor: "admin:<username>" and a fresh rationale capturing the admin's reasoning (or the tool-specific default). - Private admin-chat MCP endpoint (mounted by oracle.apollo.chat.server) exposes these tools to Apollo's LLM; not aggregated into oracle's user-facing /agentspace catalog. - indefinite: true flag wiring for critical admin actions (forget, pause/resume, rollback) — writes audit records with null expires_ts.

Acceptance. - Admin chat test: admin asks "why did you demote pshim_xyz?"; Apollo's LLM calls explain_decision, retrieves the audit record, and presents the rationale + evidence in prose. - Admin chat test: admin says "roll it back"; Apollo's LLM calls rollback_artifact; audit record with actor: "admin:<username>" and indefinite: true is written; injection-path payload on the next request reflects the rollback. - Non-admin users receive 403 on /api/v1/apollo/chat.

Rollback. Revert to the read-only tool set; admin can inspect but not act.


Milestone 12 — Autonomous Curator + drift prevention tuning

Purpose. Flip the Curator to autonomous for evolution-class proposals. Drift-class proposals still require admin review. Pause/resume broadcast works. Drift thresholds tuned based on Milestone 8–11 production data.

Scope. - Remove the "every mutation requires admin trigger" gate from Milestone 9. Curator now commits Evolution proposals autonomously after graph-anchor check passes. - pause_curator() / resume_curator() set a process-wide flag; while paused, Curator refuses all mutations (even admin-triggered), and the admin-SSE debug feed emits a broadcast event with the pause status. - Drift thresholds (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance) become production-tuned. Defaults proposed in the design spec are reasonable starting points; admin can tune via APOLLO_DRIFT_* settings. - Repeated-L3-failure escalation to DriftEvent is wired end-to-end (visible in the admin audit feed). - LLM-driven compaction of expiring observations (default: triggered on admin-initiated synthesis; can be fully autonomous if admin enables APOLLO_COMPACTION_AUTO).

Acceptance. - Integration: evolution-class proposals commit without admin intervention; audit records show actor: "curator_auto". - Integration: drift-class proposals produce DriftEvent and sit in the pending queue; admin must approve via chat. - Integration: pause_curator halts all mutations; observing queued proposals in /stats shows them untouched until resume_curator.

Rollback. Gate autonomous commit behind APOLLO_CURATOR_AUTONOMOUS=false; Apollo reverts to Milestone 9 behavior.


Milestone 13 — Maintenance + /stats polish + degraded emitters

Purpose. Ops readiness. Hourly maintenance, snapshot coarsening, and a comprehensive /stats surface that surfaces every counter + per-service health.

Scope. - oracle/apollo/maintenance.py — periodic background job (APOLLO_MAINTENANCE_INTERVAL, default 1 h) that: 1. Runs axonis_core.elastic.Elastic.delete_by_query on every index where expires_ts < now(). 2. Coarsens apollo_graph_snapshots: delete hourly snapshots older than 7 days if a daily snapshot exists for that day; delete daily snapshots older than 30 days if a weekly snapshot exists. 3. Emits job metrics (apollo_maintenance_last_run_ts, apollo_maintenance_docs_deleted_total). - /stats surface expanded to include per-service degraded_emitters array. A service is "degraded" when its apollo_ingest_last_ingest_ts{service} is older than APOLLO_INGEST_STALE_WARN_SEC, or when queue/lag thresholds are breached. Under oracle-sole-observer (§Invariants 14), for Phase-1 services this means oracle stopped observing them — e.g., oracle hasn't dispatched an MCP call to cortex in five minutes — not that a POST to Apollo failed. For secondary-path emitters (admin replay / out-of-process services), it means their POSTs stopped arriving. - Admin audit surface: GET /api/v1/apollo/audit query filters (time range, action, actor, artifact id, artifact type, trigger, score-decomposition terms). - intent_schema_coverage stat (design §Layer 1 Intent Schema Obligation) — percentage of traces with intent_schema in the last rolling window. Computed from oracle's in-process emissions (the only source of intent_schema observations in Phase 1).

Acceptance. - Maintenance job runs on schedule; expired observations are delete_by_query'd; snapshot coarsening works across the three tiers. - /stats exposes every counter named in the design spec's metrics tables, including apollo_ingest_last_ingest_ts for each Phase-1 service. - degraded_emitters correctly flags a Phase-1 service when oracle hasn't observed it within the stale window. - intent_schema_coverage updates correctly with a rolling window.

Rollback. Disable the maintenance scheduler; manual delete_by_query still possible via admin endpoint.


Milestone 14 — Subscriber LLM consumption (cortex L3)

Purpose. Close the L3 consumption side of the Injection Channel. Through M0–M13, oracle attaches apollo_guidance to every outbound MCP dispatch bound for an agent-kind L3 service — but cortex's MCP tool signatures don't declare the field, so FastMCP silently strips it before the handler runs. The brain is thinking; the body isn't listening. M14 wires the consumption side into cortex per Q20's locked contract. After M14, integration tests prove guidance changes the system prompt of a live LLM call inside a cortex tool rather than being attached and discarded.

Beacon (L1) is out of scope for this milestone. Beacon has no HTTP connection to oracle today (MCP_SERVER_URL defaults to http://localhost:8000/mcp, which is cortex direct). Until a beacon↔oracle connection is designed and wired (separate spec decision), apollo_guidance has no path into beacon's process. M14 ships the L3 reference implementation; the L1 wiring follows the same SDK pattern once beacon talks to oracle.

Scope — Cortex (L3).

  • Add axonis-core>=0.1.0 to cortex/pyproject.toml and install editable from ../axonis-core. Cortex imports the SDK directly from the canonical location — from axonis.core.apollo.guidance_cache import ApolloGuidanceCache — matching oracle's import pattern. Single source of truth, no drift-management overhead. (M14 development briefly explored vendoring the module locally to keep cortex lightweight, but cortex was already pulling from axonis.core.llm import LLMSpec for its narrative tool, so the lightweight-agent argument didn't survive contact with the codebase. Vendoring remains documented in Q15 as an option for future subscribers whose dep posture differs.)
  • Replace the design-note stub at cortex/cortex/server/app.py:40–43 with executable wiring. The constraint "cortex never addresses Apollo directly" remains true — M14 only adds consumption of guidance that oracle has already attached.
  • MCP handler integration: in cortex/cortex/server/mcp_handler.py:_handle_tools_call, before calling call_tool(tool_name, arguments):
  • Pop apollo_guidance = arguments.pop("apollo_guidance", None) from the inbound arguments. This both prevents downstream tool signatures from receiving an unexpected kwarg and isolates the guidance for cache update.
  • Instantiate a request-scoped ApolloGuidanceCache (imported from axonis.core.apollo.guidance_cache), call cache.update(apollo_guidance).
  • Expose the cache to tool implementations via a ContextVar at cortex.session.apollo_cache (get_cache() / set_cache() / reset_cache() / populate_from_arguments() helpers) so tools that internally run an LLM can read accessors without the cache leaking across requests. The handler's finally block calls reset_cache(token) so failure paths still clean up.
  • Per-tool LLM call augmentation: for any cortex tool that internally issues an LLM call, fold cache.get_system_prompt_additions(intent_context) into the tool's system prompt and cache.get_tool_description_overrides(...) into its tool catalog before invocation. Tools that do not call an LLM internally simply ignore the cache — the contextvar is read-only and harmless when unread.
  • Cache lifetime: request-scoped. Created at the top of _handle_tools_call; the contextvar resets when the handler returns or the tool raises. No cross-request leakage.

Tests.

  • Cortex unit: call _handle_tools_call with arguments containing apollo_guidance; assert the field is removed from arguments before the tool runs, and the contextvar holds a populated cache during the tool's execution and is reset afterward.
  • Cortex unit (failure posture): call _handle_tools_call with arguments lacking apollo_guidance and with apollo_guidance: None; assert no error is raised, the contextvar holds an empty cache, and accessors return empty lists / None.
  • Cortex integration: for a cortex tool that runs an LLM internally (or a thin test tool that records the system prompt it was given), assert the prompt contains each PromptShim's content.text from the attached apollo_guidance when the cache is populated, and is unchanged when guidance is absent.

Acceptance.

  • Cortex's MCP tool handler removes apollo_guidance from arguments before tool dispatch, populates a request-scoped cache, and exposes it to tool bodies via a contextvar.
  • Tests prove that an L3 LLM call inside a cortex tool observably changes when guidance is present vs. absent.
  • Oracle's attach behavior is unchanged; M3, M5, M11, M13 tests still pass.
  • Q20's failure posture is exercised: missing/None/malformed apollo_guidance is a no-op; the LLM call proceeds without guidance.

Rollback. Revert the per-service patches in cortex. Oracle's attach side has tolerated non-consumers since M3 (Q13 — attach budget overshoot omits the field without failure), so reverting M14 leaves the system in its M13 state with no functional regression at oracle.

Out of scope (this milestone). - Beacon (L1) integration. Deferred until a beacon↔oracle connection is designed. Beacon's MCP_SERVER_URL defaults to cortex direct, so attached apollo_guidance has no path into beacon's process today. Once that connection lands, beacon's L1 wiring follows the same SDK pattern (session-scoped cache + accessor reads before the upstream LLM call). - Parallax integration. Same wiring pattern as cortex (MCP handler argument pop + request-scoped cache contextvar) when it onboards. Tracked as a follow-on phase per the revised Q7 lineup. - Other L1/L3 services. Once cortex is the reference implementation for the Q20 contract, additional subscribers (titan, athena, testament, rest/fedai-rest, parallax, beacon) onboard by following the same pattern. No Apollo-side spec change required beyond the eventual beacon↔oracle connection design.


Milestone 15 — Subscriber LLM consumption (oracle L2)

Purpose. Close the L2 consumption side of the Injection Channel. Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Apollo's MiniMax synthesis LLM is independent of this; the L2 path makes Apollo's guidance available to oracle's own chat LLM, the same way M14 made it available to cortex's. After M15, integration tests prove guidance changes the system prompt of oracle's tool-executor LLM call rather than being computed and discarded.

Scope — Oracle (L2).

  • Add for_l2(...) to oracle/apollo/guidance/attacher.py — same shape as for_l1 and for_l3_agent. Returns the AttachedGuidance payload bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS. scope_label="l2". Records attribution under trace_id so the Evaluator correlates oracle-LLM signals back to the artifacts that shaped them, exactly as L1/L3 do.
  • Process-local ApolloGuidanceCache for oracle's chat LLM. Oracle owns one ApolloGuidanceCache instance (imported from axonis.core.apollo.guidance_cache). The cache is populated each turn from for_l2(...); populate-then-read happens in-process — no JSON serialisation, no envelope traversal. A ContextVar at oracle.server.llm.apollo_cache (get_cache() / set_cache() / reset_cache() / populate_for_turn() helpers) keeps per-request isolation so the cache cannot leak across concurrent /chat requests.
  • Tool-executor integration: in oracle/server/llm/tool_executor.py, before each LLM turn:
  • Call for_l2(user, intent_class, caller_tags, trace_id) to compute the applicable guidance for the current turn (or read it from the request-scoped cache if already populated upstream by the route handler).
  • Fold cache.get_system_prompt_additions(intent_context) into the system prompt and cache.get_tool_description_overrides(...) into the tool catalog rendering before invoking the configured provider.
  • After tool dispatch returns, consult cache.get_tool_pairing_hints(current_tool) to surface follow-up suggestions to the LLM if the configured provider supports tool nudges.
  • Cache lifetime: request-scoped. Populated at the top of /chat request handling; the contextvar resets when the request returns or the handler raises. The cache holds the same artifact set across every turn of a single tool-use loop — no re-fetch per turn — so artifact applicability decisions don't shift mid-loop.
  • Failure posture. Identical to L1/L3: cache miss / for_l2 timeout / accessor exception → tool-executor proceeds with no guidance applied. The /chat request still succeeds. Counters increment (apollo_guidance_attach_timeout_total{scope="l2"}), no exception bubbles up.

Tests.

  • Oracle unit (attacher): for_l2(user="...", intent_class="...", trace_id="...") returns the same AttachedGuidance shape as for_l1 / for_l3_agent; honors APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS; records attribution under the l2 scope.
  • Oracle unit (cache): oracle.server.llm.apollo_cache contextvar isolates concurrent /chat requests; reset-on-return is enforced even when the request raises.
  • Oracle integration: with a populated guidance set including a PromptShim, the system prompt sent to the configured tool-executor provider observably grows; with for_l2 returning None, the prompt is unchanged. Same shape of assertion as M14's cortex integration test.
  • Oracle failure posture: with APOLLO_GUIDANCE_ATTACH_ENABLED=false or with the attacher raising, /chat still serves the request, the LLM call still happens, no apollo_guidance is folded in.

Acceptance.

  • Oracle's tool-executor consumes guidance via ApolloGuidanceCache on every /chat turn.
  • Tests prove that an oracle-side LLM call observably changes when guidance is present vs. absent, mirroring M14's L3-side proof.
  • Oracle's existing M3 attach behavior on /chat responses (L1 attach) is unchanged; M3, M5, M11, M13 tests still pass.
  • Apollo's MiniMax LLM at /api/v1/apollo/chat is unchanged. The two LLMs remain independent.

Rollback. Revert the tool-executor patch and the for_l2 attacher addition. The /chat surface continues to work with no guidance applied — same posture as before M15.

Out of scope (this milestone). - Per-provider prompt formatting. The 5 providers in oracle's tool-executor have slightly different system-prompt conventions; M15 folds guidance via the existing system-prompt assembly path rather than introducing per-provider rendering. Provider-specific tuning is a follow-up if measurable wins surface. - Sharing the L2 cache across /chat and /api/v1/apollo/chat. Apollo's admin chat runs Apollo's MiniMax LLM and consults its own caches/state; M15 leaves it untouched. The two surfaces remain independent.


Cross-cutting concerns (applied in every milestone)

Testing

Every milestone lands with the subset of design §Test Expectations that applies to the code it introduces. No milestone merges without green tests for its own scope.

At every milestone boundary, run the full test suite across every repo the milestone touches — not just the newly-added Apollo tests. In practice:

  • oracle/.venv/bin/python -m pytest (picks up tests/ + apollo/tests/ via the testpaths config).
  • axonis-core/ — its own pytest run whenever the milestone added or changed anything in axonis-core (e.g., a new Schema.INDICES entry).

Report the pass count per repo in the milestone summary. Apollo milestones frequently touch axonis-core with additive changes; running the full suite catches regressions at the boundary rather than letting them propagate.

Observability

Every counter named in the design spec is registered with a zero value at startup (even before the code that drives it exists). Dashboards can be built as soon as Milestone 1 lands; they simply display zero for unused counters.

Logging

No Apollo module uses a module-local logging.getLogger() call. Every source file that emits log lines imports the three canonical loggers:

from axonis.core.logger import log, error, audit

This is the axonis-core implementation of the athena logging convention (athena/athena/logger.py) — same three-logger pattern, same format, same rotating-file handlers. Using the shared module guarantees Apollo's log lines interleave cleanly with oracle's and every other axonis service's when aggregated.

Per-file rule: - log — routine telemetry (info, warning, debug). - error — exceptions, permanent failures, data-loss events, misconfiguration. Must be used for every code path that surfaces a durable failure regardless of whether the exception is re-raised. - audit — important transactions that must be independently traceable. See component.oracle.apollo-APOLLO §Logging → What counts as audit-worthy for the enumerated list. New milestones add to that list when they introduce new state-changing operations (e.g., M9's Curator actions, M11's admin-chat mutations, M12's autonomous commits).

Test discipline: milestone tests that assert on logging output import from axonis.core.logger (or monkeypatch it); they never instantiate a bare logging.Logger. A lint-level AST check catches stray logging.getLogger() calls in Apollo source.

Invariants enforcement

Every milestone that touches Curator-adjacent code enforces the hard invariants from §Curator → Disallowed actions in unit tests. Example: attempting a Curator action that would read another user's conversation data must raise CuratorPolicyViolation regardless of which code path invokes it.

Index mapping versioning

Every Elastic index mapping includes schema_version: 1 on every document. If a future milestone changes a mapping, it bumps schema_version and ships a migration plan in the PR — no implicit mapping changes.

Settings discipline

No milestone reads os.environ directly. Every configuration value flows through oracle/apollo/settings.py, which is the sole in-code reader of env vars. This keeps the surface area for config changes bounded and testable.

Deployment environment inheritance

Apollo's platform-level dependencies — Elastic, Redis, SSO, log-level/workspace, federation — come from the deployment env files in developers-environment/conf/ (one .env per target: development.axonis.ai.env, matrix.axonis.ai.env, edge.axonis.ai.env, vector.axonis.ai.env, etc.). No milestone redefines, shadows, or duplicates those variables with an Apollo-specific equivalent. The full inheritance contract lives in design-spec §Environment Configuration.

Every APOLLO_* variable also lives in the shared env file. The canonical home for Apollo configuration is developers-environment/conf/development.axonis.ai.env (plus target-specific overrides in the same directory). oracle/apollo/settings.py reads them via os.getenv(...) with defaults that match the env-file values — so the codebase still comes up if the env isn't sourced, but the authoritative source for every operator-facing knob is the deployment env file, not the Python module.

When a milestone adds a new APOLLO_* variable:

  1. Register it in oracle/apollo/settings.py with its default.
  2. Add it to developers-environment/conf/development.axonis.ai.env in the appropriate subsystem block, with the same default.
  3. Document it in design-spec §Apollo-owned variables (the grouped list).

All three land in one commit. A milestone that only touches settings.py is incomplete.


Out of scope for this plan

The following work is intentionally deferred; each would be a separate plan once the corresponding prerequisites land:

  • Consolidation of oracle's existing memory modules (§Deferred in the design spec). Apollo is additive throughout all 14 milestones; oracle/server/memory/* and oracle/server/models/memory.py are untouched.
  • Keycloak client-credentials grant for service-to-service auth. Blocks background/batch ingest workers; not required for user-request-context ingest, which is all Milestones 1–14 need.
  • Additional L3 emitter onboarding beyond oracle + cortex. Parallax, UDS, athena, testament, titan, rest/fedai-rest onboard in follow-up work. Per design spec §Integration Backlog, each service is made visible to Apollo via one of two paths: (a) in-process relay (default) — when oracle MCP-dispatches to the service, oracle observes the round-trip and emits on its behalf with no code change in the service beyond its component_kind declaration; or (b) direct POST via ApolloClient — when the service's outputs are not observable through an oracle-mediated MCP round-trip. No Apollo code change required either way. Subscriber-side wiring for any of these services follows the M14 cortex pattern (request-scoped ApolloGuidanceCache populated from arguments.apollo_guidance).
  • Required-mode flips (APOLLO_REQUIRE_INTENT_SCHEMA, APOLLO_REQUIRE_TRACEPARENT). Stay false through these milestones; flipping to true is a post-Milestone-14 ops decision once coverage is proven.
  • Production-grade minimax-local LLM provider. M8 ships the minimax-local provider as a scaffold — it honors the canonical HuggingFace load signature (AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True) + AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)) and resolves weights from the HF cache at ${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/. The following production-hardening work is deferred (see design spec §Apollo's LLM → Local MiniMax via HuggingFace for the contract):
  • APOLLO_LLM_LOCAL_MODEL_PATH env override for operator-provided absolute paths (e.g., a mounted shared filesystem holding a custom MiniMax fine-tune).
  • Thread-pool / process-pool offload of the synchronous HF forward pass so the event loop isn't blocked.
  • Device mapping + quantization knobs (device_map="auto", torch_dtype, bitsandbytes 4-/8-bit settings) as env-configurable passthroughs.
  • Pre-pull orchestration + readiness gate: block APOLLO_LLM_PROVIDER=minimax-local instances from serving until the checkpoint is resident on disk and a warm-up forward pass has succeeded.
  • Streaming token output through the provider abstraction (admin chat UX).

Until these land, APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at a hosted MiniMax endpoint is the default production pattern; minimax-local is a dev / air-gapped-lab fallback.


Verification at completion

When all 15 milestones are green:

  1. uv run pytest in oracle passes, including every test from design §Test Expectations.
  2. A full request lifecycle from L1 /chat → oracle → cortex → oracle → L1 produces:
  3. Observations at every boundary under a single trace_idall emitted by oracle in-process (cortex's source tree contains no Apollo emission code).
  4. apollo_guidance attached to the /chat response and to the MCP dispatch, plus consulted in-process by oracle's own chat LLM (L2 path).
  5. Cortex consumes the attached guidance via ApolloGuidanceCache (M14): integration tests show the system prompt sent to a downstream LLM call inside a cortex tool grows when guidance carries a PromptShim. Oracle's chat LLM consumes guidance via the L2 in-process path (M15): the system prompt assembled by oracle/server/llm/tool_executor.py observably grows by the same amount when guidance is present. The L1 attach side is live but the L1 subscriber wiring (beacon) is deferred — see M14 §Out of scope.
  6. If synthetic failure signals are injected, the Evaluator demotes relevant artifacts within N cycles; the next attached MCP dispatch and the next oracle L2 cache populate both reflect the demotion, and the next cortex and oracle LLM calls observe the change.
  7. Admin can: inspect observations, view lineage, trigger synthesis, explain Curator decisions conversationally, roll back artifacts, pause/resume the Curator.
  8. The maintenance job runs hourly; expired observations are purged; snapshot coarsening tiers work.
  9. /stats exposes every metric named in the design spec.
  10. Apollo is unreachable: oracle still serves /chat and dispatches tools; apollo_guidance is omitted from envelopes; counters surface the degraded state.

Post-M15 — Prioritization Layers (shipped 2026-05-18)

The seven-layer prioritization rebuild is post-milestone work. It doesn't extend the M0–M15 build order — it overhauls how the attacher chooses which artifacts to send. Each layer is independently disabled by an env flag; none changes the underlying observation/synthesis/curator/evaluator model M0–M14 produced.

Layer Surface change
1 Capped artifacts get kind: "capped" lineage rows. New endpoints GET /lineage/capped and GET /artifacts/{id}/stats.
2 Five-tier sort key (evaluator_score → confidence → applicability specificity → weight → recency) + per-type attach caps.
3 Promote preserves evaluator_score, confidence, weight through _content_from_proposal. Contract pinned by test.
4-A Evaluator writes content.evaluator_score back to the artifact after every signal application.
4-B Synthesis prompts require confidence: 0.0..1.0; _normalize_confidence clamps + defaults.
5 rationale_summary names attached + capped artifact IDs per type. aggregate_artifact_stats query.
6-A + 6-B Promote computes an embedding and surfaces similar active artifacts as an advisory in the response.
6-C New coalescer background loop clusters near-duplicates and queues LLM-merged proposals with supersedes: [...]. Off by default.

Full contract in design spec §Prioritization Layers. Backlog (the cap-defaults empirical study, §12.9 in docs/APOLLO-FUTURE-IMPROVEMENTS.md) waits on accumulated production telemetry rather than additional code.

Post-M15 — Longevity surfaces (shipped 2026-05-19)

Two operator-facing surfaces designed to answer "how is Apollo holding up over time", complementing the per-trace observability M9-M14 already provided.

Surface Change
Effectiveness rollup New GET /effectiveness/summary?window=1d|7d|30d|90d (apollo/effectiveness.py). One read-only call aggregates observations (by event_type / service / caller_kind), synthesis (proposals + status mix + avg confidence + histogram), curator (audit-row counts by action), attach pressure (attached vs capped + per-service breakdown), artifact inventory (active + by-type + embedding coverage), and evaluator queue depth. Each section is independently failure-tolerant — a broken store returns a zero shape rather than poisoning the rest of the response.
Receiver-side persistence ApolloGuidanceCache(persist_path=...) (axonis-core/axonis/apollo/guidance_cache.py). Successful update() atomically writes the snapshot to disk; the constructor reads it back so L1/L3 receivers serve last-known-good guidance across restarts. New last_updated_at(), is_stale(max_age_seconds), and snapshot(max_age_seconds=...) expose a serving_stale flag so an oracle outage that strands receivers on yesterday's cache is visible without grepping logs. Opt-in — omitting persist_path keeps the historical pure-in-memory behavior.

Outstanding Spec Gaps

Authoritative home: SPEC-PLATFORM-14-APOLLO.md §Outstanding Items. This implementation plan no longer duplicates the open-issue register; the design spec is the single source of truth for what Apollo still owes against its mandates. Tier numbering, item status (OPEN / RESOLVED / WITHDRAWN), and resolution citations all live there.

This file retains:

  • The 15-milestone build order above (the "how" of getting Apollo into production).
  • The §Plan: axonis-core Bootstrap Idempotency Fix below (the concrete fix for Tier 2 items 5–6, which the design spec’s Outstanding Items cross-references back to).

When closing an Outstanding Item, update both: the design spec marks it RESOLVED with the commit ref; this plan adds a Post-M15 entry if the closure required new code paths worth a milestone-style writeup.

Plan: axonis-core Bootstrap Idempotency Fix

Concrete implementation plan for Tier 2 items 5–6 — the two axonis/elastic/manager.py idempotency bugs that block conduit + parallax boots against shared ES with partial state.

Repo: axonis-core (NOT oracle). Target branch: fusion-apollo (already merged from main as of fd11e19). Release target: v4.18.0 (semver patch — pure bugfix, no API change).

Step 1 — Tighten the existence check (root cause of both bugs)

The current _ensure_index uses any(k.startswith(index) for k in existing.keys()). This conflates the bare base name (data-fusion) with the date-stamped form (data-fusion-2026.06.04) and with sibling indices that happen to share a prefix (data-fusion-old). The right check is: does a date-stamped index for this base name exist? Anything else means we still need to create one.

File: axonis/elastic/manager.py

import re
from elasticsearch import BadRequestError, NotFoundError

_DATE_SUFFIX_RE = re.compile(r"^(?P<base>.+)-\d{4}\.\d{2}\.\d{2}$")


def _has_dated_index(base: str, existing: dict) -> bool:
    """True when `existing` carries at least one `<base>-YYYY.MM.DD` index.

    Bare-name indices (e.g. a manually-created `data-fusion` from a legacy
    workflow) do NOT count — bootstrap needs the date-stamped form so the
    `<base>-*` alias has something to point at.
    """
    for name in existing.keys():
        m = _DATE_SUFFIX_RE.match(name)
        if m and m.group("base") == base:
            return True
    return False


def _ensure_index(self, index: str, existing: dict) -> None:
    if _has_dated_index(index, existing):
        return
    dated = "-".join([index, datetime.today().strftime("%Y.%m.%d")])
    try:
        self.es.indices.create(
            index=dated,
            body=read_template(file_name=f"{index}_mapping.json"),
        )
        log.info(f"CREATING INDEX: {dated}")
    except BadRequestError as exc:
        # Multi-worker race: another HPA replica won the create. Treat
        # as idempotent success — the index now exists either way.
        if "resource_already_exists_exception" in str(exc):
            log.info(f"INDEX RACE OK: {dated} created by peer")
            return
        raise

This single change resolves Bug #5 (the race + the orphan-index re-create) AND eliminates the precondition for Bug #6 (parallax's bare data-fusion no longer short-circuits the create, so data-fusion-YYYY.MM.DD is created, and the subsequent put_alias(index="data-fusion-*", ...) has something to match).

Step 2 — Defensive alias self-heal

Belt-and-suspenders for the case where _ensure_index does create the dated index but ES's view of existing (snapshotted in bootstrap) is stale by the time we hit _ensure_alias. Catch NotFoundError and re-fetch:

def _ensure_alias(self, alias: str, index: str, existing: dict) -> None:
    if any(alias in info.get("aliases", {}) for info in existing.values()):
        return
    try:
        self.es.indices.put_alias(index=f"{index}-*", name=alias)
    except NotFoundError:
        # Bootstrap's `existing` snapshot was taken before _ensure_index
        # created the dated index. Re-resolve the actual index and put
        # the alias on it directly.
        dated = "-".join([index, datetime.today().strftime("%Y.%m.%d")])
        self.es.indices.put_alias(index=dated, name=alias)
    log.info(f"CREATING ALIAS: {alias}")

Step 3 — Regression tests

Add tests/test_elastic_bootstrap_idempotency.py. Use unittest.mock.MagicMock for the ES client so the tests are pure-unit and don't require a live cluster. Two scenarios:

"""Regression: bootstrap is idempotent against partial ES state.

Both scenarios reproduce production failures from 2026-06-04 (oracle
SPEC-PLATFORM-14-IMPLEMENTATION Tier 2 §5–6). Each pre-state should now
result in a clean boot — no exception propagates out of bootstrap().
"""
from unittest.mock import MagicMock
import pytest
from elasticsearch import BadRequestError, NotFoundError
from axonis.elastic.manager import ElasticManager


def _make_manager(get_alias_return: dict, create_raises: Exception | None = None,
                  put_alias_raises: Exception | None = None) -> ElasticManager:
    """Construct an ElasticManager wired to a MagicMock es client. Skips
    the __init__ network setup entirely."""
    mgr = ElasticManager.__new__(ElasticManager)
    es = MagicMock()
    es.indices.get_alias.return_value = get_alias_return
    es.indices.create.side_effect = create_raises
    es.indices.put_alias.side_effect = put_alias_raises
    mgr.es = es
    return mgr


def test_ensure_index_swallows_resource_already_exists_race():
    """Bug #5: when another HPA worker wins the create() race, the loser
    must not crash — both should converge on 'index exists'."""
    mgr = _make_manager(
        get_alias_return={},  # we see no index — try to create
        create_raises=BadRequestError(
            400, "resource_already_exists_exception", "...",
        ),
    )
    # Must not raise; idempotent.
    mgr._ensure_index("data-ingest", existing={})


def test_ensure_index_creates_dated_when_only_bare_index_exists():
    """Bug #6 prerequisite: a legacy bare `data-fusion` index in ES (no
    date suffix, no alias) must NOT trick _ensure_index into thinking
    a usable index exists. The create must still run."""
    mgr = _make_manager(get_alias_return={"data-fusion": {"aliases": {}}})
    mgr._ensure_index("data-fusion", existing={"data-fusion": {"aliases": {}}})
    mgr.es.indices.create.assert_called_once()
    called_index = mgr.es.indices.create.call_args.kwargs["index"]
    assert called_index.startswith("data-fusion-")  # date-stamped
    assert called_index != "data-fusion"


def test_ensure_alias_self_heals_on_empty_pattern_match():
    """Bug #6: when put_alias hits 404 because the bootstrap's `existing`
    snapshot was taken before _ensure_index created the dated index,
    self-heal by targeting the freshly-created dated index by name."""
    pre_404_then_ok = [NotFoundError(404, "...", "..."), None]
    mgr = _make_manager(
        get_alias_return={},
        put_alias_raises=lambda *args, **kwargs: pre_404_then_ok.pop(0) or None,
    )
    # Should not raise.
    mgr._ensure_alias("data-fusion", "data-fusion", existing={})
    # Final put_alias landed on the dated form, not the wildcard.
    final_call = mgr.es.indices.put_alias.call_args
    assert "data-fusion-" in final_call.kwargs.get("index", "")

Step 4 — Release + downstream fanout

  1. Branch off fusion-apollo: git checkout -b fix/elastic-bootstrap-idempotency.
  2. Apply Step 1 + Step 2 to axonis/elastic/manager.py.
  3. Add Step 3 test file. Verify uv run pytest tests/test_elastic_bootstrap_idempotency.py -v passes.
  4. Open MR → axonis-core main. semantic-release publishes v4.18.0.
  5. Bump consumers (uv add 'axonis-core>=4.18.0' in each):
  6. oracle/pyproject.toml
  7. cortex/pyproject.toml
  8. conduit/pyproject.toml
  9. parallax/pyproject.toml
  10. prism/pyproject.toml
  11. In each consumer, run uv lock --upgrade-package axonis-core to refresh the lockfile to v4.18.0.

Step 5 — Verification against the production scenarios

After the bump lands in oracle:

  1. Re-run the workflow suite without the harness workaround (_DEFAULT_SERVICES back to the full 5): uv run pytest -m workflow -v
  2. Expected: conduit boots cleanly even with the stale data-ingest-2026.06.04 still in ES; parallax boots cleanly against the bare data-fusion (24 docs).
  3. test_workflow_oracle_to_parallax::test_guidance_injected_into_parallax_dispatch should now run (was skipped pending parallax boot).
  4. Strike Tier 2 items 5 + 6 from §Outstanding Spec Gaps.

Risk + rollback

  • Risk: the tightened _has_dated_index check changes which indices count as "existing." A service that previously got away with a bare base-name index would now see a dated index created alongside it. The bare index becomes orphaned but not deleted. Mitigation: callout in the v4.18.0 release notes recommending operators clean up legacy bare indices once the alias is correctly attached to the new dated one.
  • Rollback: the change is contained to two private methods. Reverting the commit and re-tagging is straightforward; consumers that haven't yet bumped won't notice.

Function Flow Index

A developer reference cataloging every traversal in Apollo's runtime: which function calls which, with file:line citations and a brief reason. Use this as a debugging companion: pick a flow, follow the steps, identify where a request actually deviates from the documented path.

How to read. Each flow has a one-line trigger and a numbered list. Each step is shaped "caller (file:line) → callee — reason". Citations point at the actual call site. The deeper call-graph diagrams live in §Technical Overview (this spec); this section is the index.

Three groups: - A. Request-time hot paths — synchronous; runs on the user's /chat thread. - B. Background workers — async; off the request thread. - C. Admin / lifecycle — operator-driven or process-lifetime.

Path conventions: - oracle/... = the oracle repo root. - cortex/... = the cortex repo root. - apollo/... (alone) = oracle/oracle/....

A. Request-time hot paths

A1. /chat request — full lifecycle

Trigger: an L1 caller POSTs /chat to oracle.

POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway). Today's caller is curl, an integration test, or a direct API client; beacon onboards once a beacon↔oracle connection is wired. /api/v1/chat is distinct from POST /api/v1/apollo/chat (Apollo's admin chat — see flow C1), which runs Apollo's independent MiniMax LLM.

  1. TraceparentMiddleware.__call__ (oracle/server/middleware/trace.py:56) — reads the inbound traceparent header, calls _trace.parse_traceparent (:67); on missing/malformed calls _trace.mint_traceparent (:85); installs the result via _trace.set_current_traceparent (:87) so every emit downstream stamps the same id.
  2. OAuthMiddleware.__call__ (axonis-core) — validates the Bearer token, writes request.state.token_payload so dependencies that need auth can read it.
  3. routes.chat (oracle/server/api/routes.py:106) — handler entry; reads body + token.
  4. routes.chat:147 calls apollo_chat.emit_user_prompt to record the L1-origin observation before any work begins.
  5. routes.chat calls ToolExecutor().run(...) — drives the LLM tool-use loop (see flow A2).
  6. routes.chat:171 calls apollo_chat.emit_final_response so Apollo records what the user is about to see.
  7. routes.chat:185 calls apollo_attacher.for_l1 to compose the apollo_guidance block (see flow A5).
  8. routes.chat returns ChatResponse(..., apollo_guidance=...) to L1.

A2. Tool-use loop inside ToolExecutor.run

Trigger: routes.chat invoked the LLM tool-use loop.

  1. ToolExecutor.run (oracle/server/llm/tool_executor.py:89) calls the LLM provider router — gets the next LLM turn.
  2. ToolExecutor.run:156 calls apollo_chat.emit_llm_turn — records the L2-origin turn.
  3. If the LLM returned tool_calls, ToolExecutor.run:205 calls _call_backend_tool per call (see flow A3).
  4. After each call returns, ToolExecutor.run checks for an error envelope: on error → :220 calls apollo_chat.emit_tool_error; on success → :231 calls apollo_chat.emit_tool_output. Oracle observes the L3 round-trip and emits on the L3 service's behalf.
  5. Loop until the LLM stops emitting tool_calls; return text + tool_call trace to routes.chat.

A3. Outbound MCP dispatch (_call_backend_tool)

Trigger: ToolExecutor.run decided to invoke an L3 tool.

  1. _call_backend_tool (oracle/server/llm/tool_executor.py:301) calls registry.get_tool_route (:308) to look up the L3 service base URL.
  2. _call_backend_tool:333 calls axonis.trace.get_current_traceparent and at :335 adds the value as an HTTP header so L3's logs correlate.
  3. _call_backend_tool:343 calls registry.get_tool to read component_kind. If :345 component_kind == "agent", :347 calls apollo_attacher.for_l3_agent (see A6) and injects the result into arguments["apollo_guidance"]. Libraries skip steps 3–4.
  4. _call_backend_tool:356 opens httpx.AsyncClient(timeout=30.0) and POSTs the JSON-RPC tools/call body to <base_url>/agentspace/mcp.
  5. Returns the parsed response to ToolExecutor.run.

A4. Cortex consumption — L3 side (M14)

Trigger: cortex's MCP server receives the JSON-RPC request from oracle.

Consumer-side wiring is now in axonis.apollo.ApolloMCPMiddleware (axonis-core) — installed once in cortex/server/__main__.py:153 and covers every @mcp.tool() cortex defines. The FastMCP handler at cortex/server/mcp/server.py is Apollo-unaware; the middleware handles popping + cache install + observation emit at the ASGI boundary.

  1. The middleware (axonis-core/axonis/apollo/mcp_middleware.py:__call__) intercepts every POST to /agentspace/mcp. It buffers the JSON-RPC body and calls _parse_mcp_request (:248) which pops arguments.apollo_guidance and returns the stripped arguments.
  2. The middleware re-serializes the request with the stripped arguments (_reserialize_with_stripped_arguments, :268) so the FastMCP handler never sees apollo_guidance as a stray kwarg.
  3. The middleware builds a fresh ApolloGuidanceCache, calls cache.update(extracted_guidance), and installs it on the request-scoped contextvar via axonis.apollo.request_scope.set_cache (:135). Empty / missing guidance still installs an empty cache so accessor calls return [] cleanly.
  4. The middleware forwards the (now stripped + cached) request to FastMCP, which routes to the registered tool function (e.g., intelligence_create, draft_narrative_with_evidence).
  5. The tool body calls axonis.apollo.get_cache() (or, equivalently, cortex.tools.ai_support._get_apollo_cache()); the helper _format_apollo_guidance (cortex/ai_support_tools.py:81) reads get_system_prompt_additions, get_active_failure_patterns, get_spec_fragments and renders a ## Apollo Guidance section. Empty accessors omit their sub-section; an entirely empty cache produces a byte-identical pre-M14 prompt.
  6. The tool calls LLMClient.complete(messages=[{"content": prompt}]) with the augmented prompt.
  7. The middleware's finally block calls reset_cache(token_cache) (mcp_middleware.py:151) — contextvar cleared whether the tool returned or raised. No cross-request leakage.
  8. On the way out, the middleware captures the JSON-RPC response (captured_send at :143) and emits a tool_output or tool_error observation back to oracle via ApolloClient.emit (:177).

A5. L1 guidance attach (attacher.for_l1)

Trigger: routes.chat is about to serialize the response.

  1. apollo/guidance/attacher.py:161 for_l1(...) calls _attach(layer="l1", ...) (:285).
  2. _attach calls selectors.match_artifacts(layer="l1", ...) (apollo/guidance/selectors.py:40) and bounds the work by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS.
  3. On overshoot, _attach increments metrics.GUIDANCE_ATTACH_TIMEOUT_TOTAL.labels(scope="l1") and returns None so the response serializes without the field.
  4. On success, if a trace_id was supplied, _attach calls two attribution writes (both wrapped in defensive try/except — neither can break the chat response):
  5. In-memory: oracle.evaluator.attribution.get().record(trace_id, scope, artifact_ids) — fast, TTL-bounded by APOLLO_GRAPH_TRACE_STATE_TTL_SEC, used by the Evaluator's hot-path signal correlation.
  6. Persistent (§7.3): oracle.lineage.persist_attach(trace_id, scope, artifact_ids) — schedules a fire-and-forget asyncio task that writes one row per (trace_id, scope, artifact_id) to apollo_lineage_events. Retained for APOLLO_LINEAGE_RETENTION_DAYS (default 90). Powers retroactive /lineage queries over older traffic.
  7. Returns {as_of, artifacts, rationale_summary} to routes.chat.

Steps 1–5 are the L1 attach side: oracle composes guidance for the response envelope. The L1 consumer side (beacon-style clients reading apollo_guidance from the response and calling ApolloGuidanceCache.update(...) locally) waits on the beacon↔oracle connection design.

In parallel, the same guidance set is consumed by oracle's own chat LLM via the L2 in-process path: attacher.for_l2(...) populates a process-local cache that oracle/server/llm/tool_executor.py reads on each tool-use turn. No transport — oracle hosts Apollo, so the cache is just a Python object passed by reference. See flow A7 below.

A6. L3 guidance attach (attacher.for_l3_agent)

Trigger: _call_backend_tool is about to POST to an agent-kind L3 service.

apollo/guidance/attacher.py:122 for_l3_agent(...) calls the same _attach(...) path (:173) but with layer="l3" and scope_label="l3:<service>". selectors.match_artifacts filters on applicability.layer == "l3" and the target service_name. Library-kind dispatches never reach this function — _call_backend_tool:345 filters them out.

A7. L2 guidance consumption — oracle's own chat LLM (M15)

Trigger: oracle/server/llm/tool_executor.py is about to assemble the prompt for the next tool-use turn during a /chat request.

  1. Tool-executor calls attacher.for_l2(user, intent_class, caller_tags, trace_id) with layer="l2" and scope_label="l2". selectors.match_artifacts filters on applicability.layer == "l2".
  2. Returned AttachedGuidance payload populates a request-scoped ApolloGuidanceCache (held in a ContextVar at oracle.server.llm.apollo_cache so concurrent /chat requests don't share cache state).
  3. Before the provider call, tool-executor reads cache.get_system_prompt_additions(intent_context) and folds the strings into its system prompt; reads cache.get_tool_description_overrides(...) and applies them to the tool catalog rendering.
  4. After tool dispatch returns, tool-executor consults cache.get_tool_pairing_hints(current_tool) for follow-up suggestions.
  5. On for_l2 timeout / failure: cache stays empty for the turn, tool-executor proceeds with its baseline prompt — the /chat request still succeeds (failure posture mirrors L1/L3).

No transport — oracle hosts Apollo, so the cache is a Python object passed by reference within the same process. The L2 path is symmetric with L1 (response-attach) and L3 (MCP-arg-attach) in artifact applicability filtering and timeout budget; it differs only in transport.

B. Background workers

B1. Observation drain (per envelope)

Trigger: oracle.observer.ingest._queue.put_nowait was called by an emit helper or a secondary-path POST.

  1. _drain_worker (apollo/observer/ingest.py:188) calls _queue.get (:201) to dequeue the next envelope.
  2. _drain_worker:207 calls _is_duplicate (:380); if dup, calls _queue.task_done (:209) and continues.
  3. _drain_worker:213 calls _write_with_retry (:342) — bounded retries on transient ES failures, eventually _default_writer (:402) → ApolloObservations.create.
  4. _drain_worker:218 calls extractors_module.apply(envelope, graph_set) (apollo/learner/extractors.py:46) — mutates the five Decision Graphs deterministically (see B2).
  5. _drain_worker:219 calls graph_set.drain_all_dirty and :221 invokes the _graph_writer callback (_default_graph_writer at :414) to persist any dirty nodes/edges to apollo_graph_nodes / apollo_graph_edges.
  6. _drain_worker calls SynthesisEngine().schedule(envelope) (apollo/learner/synthesis.py:135) — fires the LLM if event type triggers (see B3).
  7. _drain_worker calls _evaluate_envelope(envelope) (apollo/observer/ingest.py:280) — runs the Evaluator pipeline (see B5). Synthesis runs before the evaluator on each envelope; either failing leaves observation persistence intact (both wrapped in try/except).
  8. On worker exception path, _drain_worker:269 calls _dead_letter to write the envelope as JSONL if APOLLO_INGEST_DEAD_LETTER_PATH is set.

B2. Decision-graph update (deterministic, no LLM)

Trigger: extractors.apply runs inside the drain worker, post-write.

  1. extractors.apply (apollo/learner/extractors.py:46) calls each per-graph extractor: _extract_intent_tool (:114), _extract_prompt_shape (:158), _extract_service_routing (:186), _extract_outcome (:219), _extract_iteration (:270).
  2. Each extractor calls graph_set.graph(graph_id) (apollo/learner/graphs.py:291) to get the right DecisionGraph.
  3. Each extractor calls graph.upsert_node(kind, label, trace_id, at, ...) (apollo/learner/graphs.py:146) and graph.upsert_edge(source, target, trace_id, at, success) — idempotent per (graph_id, kind, label) and per-trace.
  4. upsert_edge updates weight_short and weight_long EWMAs using APOLLO_GRAPH_EWMA_SHORT / APOLLO_GRAPH_EWMA_LONG. First observation pins both windows to the observation value (no cold-start artifact).
  5. Some extractors call graph_set.trace_scratch(trace_id) (:307) to stash intent/service info for later events on the same trace to stitch into.

B3. Synthesis dispatch (LLM-driven)

Trigger: _drain_worker calls SynthesisEngine.schedule(envelope) after the graph update.

  1. SynthesisEngine.schedule (apollo/learner/synthesis.py:135) looks up _SYNTHESIS_FLAVOR[envelope.event_type]. If None (e.g., llm_turn, final_response), returns immediately.
  2. If trace_id is in self._in_flight, records the latest envelope in self._latest_by_trace and returns — the running task picks it up on completion.
  3. Otherwise calls asyncio.create_task(self._run_trace(trace_id, flavor)).
  4. _run_trace (apollo/learner/synthesis.py:195) acquires self._sem (concurrency cap = APOLLO_SYNTHESIS_MAX_CONCURRENT), then calls _synthesize_from_envelope (:218).
  5. _synthesize_from_envelope calls _slice_graph_set (:344) to pull the relevant subgraph; calls the appropriate prompts.build_*_prompt (apollo/learner/prompts.py) to compose the LLM input.
  6. _synthesize_from_envelope:236 calls LLMClient.get().complete(system=..., messages=...) — this is Apollo's own LLM, separate from oracle's user-facing one.
  7. _synthesize_from_envelope:294 calls drift_module.run_all (see B4) to gate the proposal.
  8. _synthesize_from_envelope:315 appends the proposal to self._pending with status: "approved" or "drift_flagged". The pending list caps at 500; older entries are dropped (:317-318).

B4. Drift gate (drift.run_all)

Trigger: _synthesize_from_envelope has an LLM proposal in hand.

  1. drift.run_all (apollo/learner/drift.py:249) calls check_proposed_pattern_vs_edges (:78) — proposal references must match real outcome-graph edges.
  2. Calls check_intent_classification_vs_clusters (:126) — proposed intent classes must match existing clusters.
  3. Calls check_weight_swings (:163) — z-score check against existing weight distribution; passes if <2 priors.
  4. Calls check_trajectory_coherence (:206) — proposed direction must align with EWMA trajectory; passes if no trajectory yet.
  5. Returns DriftCheckResult(approved=all_passed, checks=[per_check_detail]). No LLM involved here — pure math against graph state.

B5. Evaluator pipeline (_evaluate_envelope)

Trigger: _drain_worker calls _evaluate_envelope (apollo/observer/ingest.py:280) after synthesis schedule for any qualifying observation.

  1. _evaluate_envelope:299 calls attribution.get().applied_for(trace_id, service_name) (apollo/evaluator/attribution.py) to find which artifacts the attacher recorded for this trace. Returns early if no attributions.
  2. _evaluate_envelope:306 calls signals.detect_signals(envelope, applied_artifact_ids) (apollo/evaluator/signals.py) — returns a list of SignalHit for L3_ERROR / SCHEMA_MISMATCH / USER_FEEDBACK / EVALUATOR_CONFIDENCE. Returns early on no hits.
  3. _evaluate_envelope:316 gets the singletons via scoring.get_engine() and recommendations.get_queue().
  4. Per signal hit: engine.apply_signal(artifact_id, signal_kind, magnitude) (apollo/evaluator/scoring.py) — pulls the EMA score toward its tier asymptote.
  5. After each apply, cascade.cascade_on_l3_dominant(engine, artifact_id) (apollo/evaluator/cascade.py) — returns CascadeOutcome with action in {none, drift_event, recommend_fast_demote, recommend_demote}.
  6. If non-none, _evaluate_envelope calls queue.add(Recommendation(...)) (apollo/evaluator/recommendations.py) with score + decomposition + upstream_artifact_ids. Replace-semantics: latest rec for an artifact overrides the prior one.

B6. Curator atomic sequence (every mutation)

Trigger: any of actions.promote / demote / forget / edit / rollback, called from an admin endpoint, an admin-chat tool, or the autonomous sweep.

  1. The action calls pause.raise_if_paused (apollo/curator/pause.py:73) first — if curator is paused, raises CuratorPaused; the sequence stops.
  2. The action calls policy.allow_or_raise(ActionRequest(kind, actor, artifact_id)) (apollo/curator/policy.py:110) — six hard invariants enforced; raises CuratorPolicyViolation if any tripped.
  3. The action calls _copy_current_to_history(artifact_id) (apollo/curator/actions.py:105) — prior version moves to apollo_artifact_history with a retired_at stamp.
  4. The action calls _artifacts().create(record, uid=artifact_id) (actions.py:190 for promote; :255 demote; :392 edit) — overwrites current with the new version.
  5. The action calls audit.write_audit(ApolloAuditRecord(...)) (apollo/curator/audit.py:95) — required rationale, optional evidence_ref, indefinite=True for critical actions like forget/rollback/pause.
  6. The action calls _broadcast_commit(action_kind, artifact_id, actor) (actions.py:123) — fires SSEHub().broadcast({event: "curator_commit", ...}, scope="*"). SSE failures are swallowed; durable state already landed.
  7. Returns ActionResult(action, artifact_id, version, audit_record_id, before_version_id, after_version_id) to the caller.

The five mutation entry points: actions.py:148 promote, :222 demote, :297 forget, :349 edit, :424 rollback. All five share the same atomic shape above.

B7. Autonomous curator sweep (M12)

Trigger: oracle.app:65 schedules auto.run_periodic as a long-lived task at startup; fires on a periodic interval.

  1. auto.run_periodic (apollo/curator/auto.py:248) calls auto.sweep_once on each tick.
  2. sweep_once (apollo/curator/auto.py:112) reads settings.APOLLO_CURATOR_AUTONOMOUS. Disabled → returns {ran: False, reason: "disabled"}.
  3. sweep_once calls pause.is_paused() (apollo/curator/pause.py:65). Paused → returns {ran: False, reason: "paused"}.
  4. For each status: "approved" proposal in SynthesisEngine().pending_snapshot():
  5. sweep_once calls derive_artifact_id(proposal) (auto.py:54) — deterministic prefix-hash so repeated proposals converge on one artifact.
  6. sweep_once calls actions.promote(actor="curator_auto", trigger="autonomous_curator", ...) — runs flow B6.
  7. For each kind in ("demote", "fast_demote") recommendation in RecommendationQueue.snapshot():
  8. sweep_once calls actions.demote(actor="curator_auto", trigger="autonomous_curator", evaluator_score=..., score_decomposition=..., upstream_artifact_ids=...) — runs flow B6.
  9. sweep_once returns {ran: True, auto_promoted: N, auto_demoted: M, drift_retained: K}. Drift-class work (status: "drift_flagged" and kind: "drift_event") is left for admin review.

B8a. Prioritization-layer cross-cuts (Layers 1, 4-A, 6-A/B)

Where each layer hooks in. The prioritization rebuild added several call sites scattered across existing flows. Listed here together so a reader can map "what fires when" without re-reading each parent flow.

  1. Layer 1 — capped-lineage persist. Inside attacher._attach (apollo/guidance/attacher.py) after _apply_attach_caps returns the (kept, dropped_pairs) tuple. If trace_id is present and dropped_pairs is non-empty, the attacher calls lineage.persist_capped(trace_id, scope, capped=dropped_pairs) (apollo/lineage/persist.py). Fire-and-forget; writes kind: "capped" rows to apollo_lineage_events with artifact_type + scope. Failures land in apollo_lineage_capped_persist_failed_total (no log noise).
  2. Layer 4-A — evaluator score writeback. Inside _evaluate_envelope after engine.apply_signal() in step 4 of flow B5. The ingest worker calls persist.persist_score_to_artifact(artifact_id, score, decomposition) (apollo/evaluator/persist.py) — fire-and-forget Painless script update writing content.evaluator_score and content.score_decomposition. The next attach call's _sort_key reads it. Kill switch: APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED.
  3. Layer 6-A — promote-time embedding. Inside actions.promote (step in flow B6), right before _artifacts().create, the action calls _embed_and_find_similar (apollo/curator/actions.py) which:
  4. Calls similarity.compute_embedding(content, artifact_type) (apollo/learner/similarity.py) — reuses axonis.memory.embedder.embed. Returns None if sentence-transformers is unavailable (graceful degradation).
  5. If embedding succeeded, stores it under content.embedding_vector so the new record persists it on write.
  6. Layer 6-B — similarity advisory. Same _embed_and_find_similar helper then calls _load_active_set_for_similarity(artifact_type) and similarity.find_similar_active_artifacts(...). Hits ≥ APOLLO_SIMILARITY_THRESHOLD (default 0.9) are returned to the promote handler and surface in ActionResult.similar_artifacts. The promote still succeeds — advisory only.

B8b. Coalescer sweep (Layer 6-C)

Trigger: oracle.app:_coalescer_task is created at startup when APOLLO_COALESCER_ENABLED=true (off by default); fires every APOLLO_COALESCER_INTERVAL_SEC (default 3600s).

  1. coalescer.run_periodic (apollo/learner/coalescer.py) sleeps APOLLO_COALESCER_INTERVAL_SEC then calls run_sweep_once on each wake.
  2. run_sweep_once calls _load_all_active_artifacts — single ES scan returning every status=active artifact.
  3. _find_clusters partitions by (type, applicability.service_name, applicability.tool_name). Within each partition, _pairwise_cluster runs union-find over cosine similarity ≥ APOLLO_COALESCER_THRESHOLD (default 0.85). Yields clusters of ≥ 2 members.
  4. Bounded by APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN (default 5) — extras counted as summary.skipped.
  5. For each in-budget cluster: _propose_merger_for_cluster(cluster, client) calls Apollo's LLM via build_coalesce_prompt(artifact_type, cluster) (apollo/learner/prompts.py). The LLM writes a single artifact whose content covers every member's intent; the helper injects supersedes: [id1, id2, ...] listing the cluster members.
  6. _record_merger_proposal routes through SynthesisEngine._record_proposal (apollo/learner/synthesis.py) so the merger gets confidence normalization, drift checks, and lands on apollo_proposals like any other proposal.
  7. Bad JSON / LLM failures increment apollo_coalescer_merge_failed_total and skip the cluster — sweep continues.
  8. Returns {clusters_found, proposals_emitted, skipped} for /stats and tests.

Admin downstream. When the admin promotes a coalescer-emitted proposal via POST /api/v1/apollo/artifacts/{merger_id}/promote with supersede: true, the promote handler (flow B6, extended) reads proposal.supersedes and demotes each listed artifact in the same atomic batch (each via flow B6's demote path, each with its own audit record).

B8. Maintenance tick (M13)

Trigger: oracle.app:73 schedules maintenance.run_periodic as a long-lived task at startup; fires every APOLLO_MAINTENANCE_INTERVAL (default 1h).

  1. maintenance.run_periodic (apollo/maintenance.py:35) calls run_once on each tick.
  2. run_once:61 (apollo/maintenance.py:51) calls _purge_expired (:82) — for each of apollo_observations, apollo_audit, apollo_graph_snapshots: builds {"range": {"expires_ts": {"lt": now}}} and calls _delete_by_query (:114). Indefinite records (expires_ts: null) are skipped automatically by the range filter.
  3. run_once:65 calls _coarsen_snapshots (:146) — hourly→daily→weekly tiering hook. Currently a no-op stub; tier-generation logic deferred per Q5.
  4. run_once:69 calls _emit_metrics (:170) — sets apollo_maintenance_last_run_ts and increments apollo_maintenance_docs_deleted_total{index} per index.

/stats read-side helpers are siblings: degraded_emitters (:186) scans INGEST_LAST_INGEST_TS; intent_schema_coverage (:214) computes the rolling fraction.

C. Admin / lifecycle paths

C1. Admin chat loop (M11)

Trigger: admin POSTs /api/v1/apollo/chat with {"action": "chat", "message": "..."}.

  1. chat.server.admin_chat (apollo/chat/server.py:79) validates the admin role + parses the request.
  2. For action: "chat", admin_chat:117 calls _run_chat_loop(message, actor) (:196).
  3. _run_chat_loop:203 calls _render_tool_catalog_for_llm (:303) to build the system context.
  4. _run_chat_loop:209 gets LLMClient.get() and starts the bounded iteration loop (_MAX_CHAT_ITERATIONS = 6).
  5. Per iteration, :220 calls client.complete(messages, response_format="json").
  6. If LLM returned {"action": "call_tool", ...}: :253 calls _run_tool_or_400(tool_name, arguments, actor) (:134) which looks up chat_tools.TOOL_IMPLEMENTATIONS[tool_name] (apollo/chat/tools.py:538) and invokes it. Mutation tools route through actions.* (flow B6) with trigger="admin_chat".
  7. _run_chat_loop appends the tool result to messages and continues the loop.
  8. If LLM returned {"action": "respond", "text": ...}: returns the prose + tool_trail to admin_chat, which returns the JSON response to the operator.

For action: "invoke", admin_chat:109 skips the loop and calls _run_tool_or_400 directly — single-shot tool invocation without the LLM.

C2. Pause / resume curator (M11)

Trigger: admin calls pause_curator(...) (chat tool or REST endpoint).

  1. pause.set_paused (apollo/curator/pause.py:80) sets the module-level _state (in-memory only — by design, not persisted across restart).
  2. set_paused:107 calls audit.write_audit(ApolloAuditRecord(action=PAUSE_CURATOR, indefinite=True, ...)) — pause records never expire.
  3. set_paused:110 calls SSEHub().broadcast({event: "curator_paused", ...}) so admin clients see the freeze in real time.
  4. From this moment, every Curator function calls raise_if_paused (apollo/curator/pause.py:73) at the top (step 1 of B6), causing CuratorPaused until clear_paused (:125) runs the inverse (audit at :151 + broadcast at :154).

pause.is_paused (:65) is the pure read — used by sweep_once (B7) and by chat tools that want to surface the freeze without raising.

C3. Startup

Trigger: oracle process boots; oracle.app.startup (apollo/app.py:46) runs from oracle's Starlette lifespan.

  1. oracle.app.startup:51 calls ingest_module.startup (apollo/observer/ingest.py:70) — initializes _queue, sets _writer / _graph_writer to defaults, spawns _drain_worker × APOLLO_INGEST_WORKER_CONCURRENCY (:84).
  2. oracle.app.startup:55 calls SynthesisEngine().set_graph_getter(...) so the synthesis dispatcher can pull subgraph excerpts from the live in-memory graph_set.
  3. oracle.app.startup:58 schedules snapshots_module.run_periodic (hourly graph snapshot loop) as _snapshot_task.
  4. oracle.app.startup:65 schedules auto.run_periodic (B7) as _auto_task.
  5. oracle.app.startup:73 schedules maintenance.run_periodic (B8) as _maint_task.

What's NOT wired today: graph_set.load_from_records(...) (apollo/learner/graphs.py:340) is referenced in design docs and exercised by tests, but oracle's startup path does not call it. The graph_set comes up empty on each restart and rebuilds as observations stream in. The pause state is also intentionally non-durable — a fresh process always comes up unpaused (per apollo/curator/pause.py docstring).

C4. Secondary-path ingest (admin replay / out-of-process emitter)

Trigger: an admin or out-of-process service POSTs to /api/v1/apollo/observations.

  1. TraceparentMiddleware and OAuthMiddleware run as on the /chat path (A1 steps 1–2).
  2. guidance.api.post_observations (apollo/guidance/api.py:66) parses the body into an ObservationBatch. Stamps caller_identity from the token if the envelope didn't carry one.
  3. post_observations calls ingest.ingest(envelope) (apollo/observer/ingest.py:143) per item — exact same downstream path as the primary in-process call (the rest of B1 from step 1 onward).
  4. Returns 202 Accepted with {accepted, dropped} as soon as every envelope is enqueued or counted as full.

Notes for readers

  • Flow A1 is the umbrella — A2–A7 are sub-flows it triggers, in roughly that order during a single /chat.
  • Flow B6 is the universal mutation shape — every place artifacts change (admin endpoint, admin chat, autonomous sweep) routes through it.
  • Apollo never originates a network call. Apollo lives inside oracle's process; every httpx.post you see in flows above is oracle calling out, not Apollo. Apollo's only HTTP surface is the inbound /api/v1/apollo/* routes oracle mounts.
  • Observations and guidance go in opposite directions — oracle → Apollo for observations (in-process); Apollo → subscriber for guidance (response-attached). Both ride existing envelopes; neither uses a separate transport.
  • Synthesis runs before the evaluator on each envelope (_drain_worker order: write → graph update → synthesis schedule → evaluator). Both are wrapped in defensive try/except so neither can wedge ingest.
  • B6 is the atomic boundary, not a transaction. Steps 3–6 are tightly coupled but ES is not transactional across indices — partial failure at step 4 leaves a history record with no live successor. Tracked in §Future Improvements §7.1.

For deeper detail (ambient state tables, telemetry counter inventory, full call graphs), see oracle/specs/APOLLO-TECHNICAL-OVERVIEW.md. For the why behind each design choice, see oracle/specs/SPEC-PLATFORM-14-APOLLO.md §Design Decisions.

Technical Overview

A working technical reference describing the live Apollo runtime: who calls what, when, why, and how. Focused on the call graph, not the code layout. This complements §Function Flow Index (which is the file:line citation index) — this section is the operational call graph: what fires, under what condition, in what process, on what thread, backed by what state.

Process topology — where each component lives

Apollo is a package inside oracle's process. It is not a standalone service; it has no network of its own; it shares oracle's Python interpreter, asyncio event loop, Starlette app, and auth middleware.

Process What runs there Notes
oracle Oracle's REST/MCP handlers + Apollo (as a package) The only externally-reachable service (SPEC-03 §1)
cortex Cortex's Starlette app + MCP handler + request-scoped ApolloGuidanceCache (M14, imported from axonis.core.oracle.guidance_cache) No Apollo emission code; no ApolloClient; no ApolloIntegration
beacon Beacon's chat ingress + per-provider LLM call Deferred — beacon has no HTTP connection to oracle today (MCP_SERVER_URL points at cortex direct). L1 subscriber wiring follows the cortex SDK pattern once that connection is designed.
parallax Deferred from Phase 1 When onboarded, follows the cortex MCP-handler pattern (argument pop + request-scoped cache)
other browser / L1 clients L1 code — composes prompts, renders responses Optional local ApolloGuidanceCache per session, same pattern as beacon will follow once L1 wiring lands
admin CLI / admin browser Admin tooling — calls /api/v1/apollo/* endpoints Only admin-role tokens get past the guard

Consequence. There is no "Apollo server" to deploy independently. There is no IPC between oracle and Apollo. Every call into Apollo from oracle is a direct Python function call on the same event loop.

The one rule that shapes every call graph

Neither L1 nor L3 calls Apollo directly. Both call oracle. Oracle calls Apollo.

Captured as Invariant #14 in the design spec. Every call graph in this section respects it. When you see something that looks like it violates it, re-read — the caller is always oracle or an admin.

Ambient state — what's alive for the duration of a request

Apollo uses three kinds of ambient state. Each has a clearly-scoped lifetime.

State Type Scope Who sets Who reads
axonis.core.trace._current_traceparent ContextVar[str] Per inbound request (per async task) TraceparentMiddleware Every emit helper, extract_http_headers, every outbound httpx call that forwards traceparent
request.state.token_payload Attribute on the Starlette request object Per request OAuthMiddleware (axonis-core) Every FastAPI dependency that does auth (require_auth, require_admin)
SynthesisEngine._in_flight / _latest_by_trace / _pending Process-wide dicts on the singleton Process lifetime SynthesisEngine.schedule() Drain worker + pending_snapshot() for admin
apollo_artifacts / apollo_artifact_history / apollo_audit Elasticsearch indices (Milestone 9) Persistent curator.actions.* (promote / demote / forget / edit / rollback) GET /artifacts + GET /audit + subscriber attach path
AttributionRegistry._by_trace Process-wide dict on the M10 singleton Per-request (aged out at TTL) attacher.for_l1 / for_l3_agent when called with trace_id= ingest._evaluate_envelope on every qualifying observation
ScoringEngine._scores Process-wide dict on the M10 singleton Process lifetime ingest._evaluate_envelopeengine.apply_signal() cascade.cascade_on_l3_dominant(), /recommendations, /stats
RecommendationQueue._by_artifact Process-wide dict on the M10 singleton Process lifetime ingest._evaluate_envelopequeue.add() GET /api/v1/apollo/recommendations; cleared by curator.demote()
curator.pause._state (PauseState) Process-wide dataclass (M11) Process lifetime (reset on restart) curator.pause.set_paused() / clear_paused() raise_if_paused() at top of every Curator mutation; oracle.chat.tools.pause_curator / resume_curator
Autonomous sweep loop (M12) asyncio.Task in oracle.app._auto_task Process lifetime oracle.curator.auto.run_periodic started from oracle.app.startup N/A — writes to audit + SSE on each commit
Maintenance loop (M13) asyncio.Task in oracle.app._maint_task Process lifetime oracle.maintenance.run_periodic started from oracle.app.startup Purges expired docs via delete_by_query; updates apollo_maintenance_last_run_ts + apollo_maintenance_docs_deleted_total
Synthesis sweep loop asyncio.Task in oracle.app._sweep_task Process lifetime oracle.learner.synthesis.run_sweep_periodic started from oracle.app.startup Periodic event-independent synthesis pass; updates apollo_synthesis_sweep_* metrics
Coalescer loop (Layer 6-C) asyncio.Task in oracle.app._coalescer_task Process lifetime oracle.learner.coalescer.run_periodic started from oracle.app.startup (off by default; APOLLO_COALESCER_ENABLED=true to activate) Scans active artifacts for similarity clusters; queues LLM-merged proposals on apollo_proposals with supersedes: [...]
Score writeback (Layer 4-A) Fire-and-forget tasks scheduled per signal Per-signal oracle.evaluator.persist.persist_score_to_artifact called from ingest._evaluate_envelope Updates content.evaluator_score + content.score_decomposition on the artifact doc; sort key reads it on next attach
SSEHub._subs Process-wide dict on the singleton Process lifetime GET /guidance/stream handler broadcast() from Curator (M9+), GET /subscribers
ingest._queue, ingest._dedup_window Module-level asyncio objects Process lifetime ingest.startup() ingest.ingest(), drain workers
graph_set Module-level GraphSet instance Process lifetime ingest.startup() Drain workers (extractors), SynthesisEngine, hourly snapshot task

The /chat call graph — the canonical flow

Every box below is a function/method invocation. The arrows are synchronous calls unless marked [async]. "Oracle" and "Apollo" are both inside the same process; every → between them is an in-process function call.

Inbound: L1 → Oracle

POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway). Today's L1 caller is curl or another direct API client; beacon onboards once a beacon↔oracle connection is wired. This is a separate surface from POST /api/v1/apollo/chat (Apollo's admin chat at oracle/oracle/chat/server.py:79), which runs Apollo's independent MiniMax LLM for talking to Apollo's synthesis brain.

L1 caller  (curl / future beacon)
  │ POST /chat  (HTTP; body: {message, conversation_id, model}; headers: Authorization + traceparent)
  ▼
oracle uvicorn worker
  │
  ├─▶ TraceparentMiddleware.__call__()                       [ASGI middleware, outer]
  │     └─▶ axonis.core.trace.parse_traceparent(header)
  │     └─▶ axonis.core.trace.set_current_traceparent(value)  (ContextVar installed)
  │
  ├─▶ OAuthMiddleware.__call__()                             [axonis-core, next inner]
  │     └─▶ validates Bearer; writes request.state.token_payload
  │
  └─▶ FastAPI router → routes.chat(body, request, token_payload)

Why: the middlewares run outside the handler so trace_id and token_payload are already in scope when business logic runs. Both are per-request ambient state — no code needs to thread them explicitly across ~10 internal function boundaries.

Handler: oracle's /chat → Apollo (L1 emit)

routes.chat(body, request, token_payload)
  │
  ├─▶ RateLimiter().check(client_id)                         [local, guardrail]
  ├─▶ GuardrailPolicy.load() → filter allowed tools
  ├─▶ ConversationStore().get(conversation_id)               [Redis or in-mem fallback]
  │
  ├─▶ axonis.core.trace.current_trace_id()                   [reads ContextVar]
  │     └─▶ parsed value from TraceparentMiddleware
  │
  ├─▶ oracle.hooks.chat.emit_user_prompt(
  │       prompt=body.message,
  │       conversation_id=body.conversation_id,
  │       token_payload=token_payload,
  │       trace_id=<from ContextVar>)                        [async, in-process]
  │     │
  │     └─▶ oracle.observer.ingest.ingest(envelope)
  │           └─▶ _queue.put_nowait(envelope)                 (non-blocking)
  │                 ↳ if QueueFull: metrics.INGEST_QUEUE_DROPPED_TOTAL.inc()
  │
  └─▶ ToolExecutor().run(...)  [see tool-use loop]

Why: L1 never reaches Apollo. Oracle extracts L1 signals from the /chat body and calls the observer in-process. emit_user_prompt is the canonical entry point — it builds the envelope, stamps caller_identity from token_payload, and drops it on the async queue. The helper never raises into the handler; if the queue is full, it counts + returns.

Tool-use loop: Oracle ↔ LLM + Oracle → L3 → Apollo

ToolExecutor.run(...)
  │
  ├─▶ for iteration in range(max_iterations):
  │
  │   ├─▶ llm.router.complete(messages, model, tools, system)
  │   │     └─▶ provider-specific SDK call (anthropic / openai / …)
  │   │
  │   ├─▶ oracle.hooks.chat.emit_llm_turn(...)               [async, in-process]
  │   │     └─▶ ingest.ingest(envelope type=llm_turn)
  │   │
  │   ├─▶ if tool_calls:
  │   │     for tc in tool_calls:
  │   │       │
  │   │       ├─▶ registry.get_tool(tool_name)                 (ServiceRegistry lookup)
  │   │       ├─▶ t0 = time.perf_counter()
  │   │       ├─▶ _call_backend_tool(...)                      [see outbound MCP dispatch]
  │   │       ├─▶ latency_ms = (now - t0) * 1000
  │   │       │
  │   │       ├─▶ if error in result_text:
  │   │       │     oracle.hooks.chat.emit_tool_error(...)    [async]
  │   │       │   else:
  │   │       │     oracle.hooks.chat.emit_tool_output(...)   [async]
  │   │       │
  │   │       │   (both go through ingest.ingest() → in-process queue)
  │   │       └─▶
  │   └─▶ else: break
  │
  └─▶ Meter.record_tokens(client_id, provider, …)

Why: Oracle's LLM loop is the L2 emitter for llm_turn events and the L3 observer for tool_output/tool_error. The emitter wraps the backend call with a timer so latency rides on every envelope. Errors are detected by parsing _call_backend_tool's JSON return (it serializes errors into {"error": ...} rather than raising) so the emit path can cleanly branch between output and error.

Oracle → L3 (outbound MCP dispatch)

_call_backend_tool(registry, tool_name, tool_args, raw_token)
  │
  ├─▶ base_url = registry.get_tool_route(tool_name)
  │   (if None → local oracle tool; short-circuit path omitted here)
  │
  ├─▶ headers = {Content-Type, Accept}
  ├─▶ headers["Authorization"] = f"Bearer {raw_token}"
  │
  ├─▶ axonis.core.trace.get_current_traceparent()           [reads ContextVar]
  │     └─▶ headers["traceparent"] = <value>                  (if present)
  │
  ├─▶ tool_info = registry.get_tool(tool_name)
  │   if tool_info.component_kind == "agent":
  │     oracle.guidance.attacher.for_l3_agent(
  │         service_name=tool_info.service_name,
  │         tool_name=tool_name,
  │         intent_class=None)                               [in-process; bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS]
  │     └─▶ if not None: tool_args["apollo_guidance"] = <payload>
  │   else:
  │     (library → skip guidance injection entirely)
  │
  └─▶ httpx.AsyncClient(timeout=30.0).post(
        f"{base_url}/agentspace/mcp",
        json={jsonrpc, method="tools/call", params={name, arguments}},
        headers=headers)

Why: oracle is the only place that knows the component_kind of each L3 target. It filters libraries out of guidance attachment, preserves Authorization end-to-end for L3 to authenticate the user, and forwards traceparent so L3's own logs correlate with oracle's observations. The attacher runs in-process (Apollo lives here) so there is no network hop to fetch guidance.

L3 side: what cortex does with apollo_guidance

The flow below is what M14 wires (current Phase 1 subscriber for L3). Through M5–M13 the attached field rode the wire but FastMCP stripped it before the tool ran — guidance was effectively dark on the consumption side. M14 adds the pop + cache update + accessor reads.

cortex MCP handler receives tool call (cortex/cortex/server/mcp_handler.py:_handle_tools_call)
  │
  ├─▶ apollo_guidance = arguments.pop("apollo_guidance", None)
  ├─▶ cache = ApolloGuidanceCache()                           [from axonis.core.oracle.guidance_cache]
  ├─▶ cache.update(apollo_guidance)                           [no-op when None]
  ├─▶ apollo_cache_var.set(cache)                             [request-scoped ContextVar]
  │
  ├─▶ call_tool(tool_name, arguments)   # arguments no longer carries apollo_guidance
  │   ├─▶ during any LLM call inside the tool, the implementation reads:
  │   │     cache.get_system_prompt_additions(intent_context)   → appended to system prompt
  │   │     cache.get_tool_description_overrides(tool_name)     → applied per tool
  │   │     cache.get_spec_fragments(...)                       → optional
  │   │     cache.get_tool_pairing_hints(...)                   → optional
  │   │     cache.get_active_failure_patterns(...)              → optional
  │   │     cache.get_service_connection_hints(...)             → optional
  │   │
  │   └─▶ folds the results into its prompts / routing decisions
  │
  └─▶ returns the MCP response back to oracle (the contextvar resets when the handler returns)

Why: cortex never initiates any call to Apollo. The only Apollo-facing contract it carries is this read path — consume the guidance, apply it, discard it. Cache lifetime naturally scopes to the request because L3 only acts inside a user-request context; there is no background state to maintain between requests.

Parallax is deferred from Phase 1; when it onboards, its MCP handler follows the same pattern (argument pop + request-scoped cache contextvar).

Oracle → Apollo (L3 observation)

Happens after the outbound MCP dispatch completes, still inside ToolExecutor.run. See the tool-use loop for the wrapping; the actual emit is:

oracle.hooks.chat.emit_tool_output(
    trace_id=<from ContextVar>,
    conversation_id=...,
    token_payload=token_payload,
    service_name=tool_info.service_name,  # e.g., "cortex"
    tool_name=tool_name,
    arguments=<original tool_args, apollo_guidance stripped>,
    output=result_text,
    latency_ms=...)
  │
  └─▶ ingest.ingest(envelope)

Why service_name is the L3 target, not "oracle": per the oracle-sole-observer design, oracle is the actual emitter, but the envelope records what was observed. The Evaluator (M10) keys on this field to apply per-service signal gates.

Why apollo_guidance is stripped before recording: Apollo must not observe its own injections as if they were part of the caller's intent. emit_tool_output strips the key explicitly.

Outbound: Oracle → L1

routes.chat(...) continues:
  │
  ├─▶ ConversationStore.append(...)                          [if conversation_id]
  │
  ├─▶ oracle.hooks.chat.emit_final_response(...)              [async, in-process]
  │     └─▶ ingest.ingest(envelope type=final_response)
  │
  ├─▶ oracle.guidance.attacher.for_l1(
  │       user=token_payload.subject,
  │       intent_class=None)                                   [in-process; bounded attach]
  │
  └─▶ return ChatResponse(response, conversation_id, tool_calls,
                          model_used, tokens, apollo_guidance=<payload>)

Why: final_response records what actually reached L1 (not what oracle intended — what the envelope carried). apollo_guidance on the response body is the L1 subscription channel; whenever L1's own ApolloGuidanceCache.update(...) is called with this payload, L1's next prompt composition sees the freshest guidance. Oracle's own chat LLM (the tool-use loop that produced this response) consumes the same guidance via the L2 in-process path before assembling the next turn.

Background workers — what runs off the request path

Apollo runs four long-lived tasks inside oracle's event loop. None of them block request handling.

Task Started Does what Frequency
_drain_worker × N ingest.startup() (N = APOLLO_INGEST_WORKER_CONCURRENCY, default 4) Dequeues envelopes → writes to apollo_observations → runs extractors → upserts graphs → fires SynthesisEngine().schedule() Continuously; blocks on _queue.get()
run_periodic (snapshot loop) oracle.app.startup() Serializes the in-memory graph_setapollo_graph_snapshots APOLLO_GRAPH_SNAPSHOT_INTERVAL (default 3600s)
SynthesisEngine per-trace task SynthesisEngine.schedule(envelope) Pulls subgraph → builds prompt → calls LLM → runs drift checks → appends to _pending Fires per-trace; collapses by trace_id; bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4)
ApolloClient._timer_loop ApolloClient.start() Periodic flush of the per-auth-token buffer APOLLO_INGEST_FLUSH_INTERVAL_MS (default 500ms) — secondary path only

Drain worker call graph (inside ingest._drain_worker)

loop:
  envelope = await _queue.get()
  │
  ├─▶ if _is_duplicate(envelope):  # (trace_id, event_type, timestamp, service) within window
  │     metrics.INGEST_DEDUPE_TOTAL.inc()
  │     _queue.task_done()
  │     continue
  │
  ├─▶ await _write_with_retry(envelope)                    (ES write through ApolloObservations UDS store)
  │
  ├─▶ extractors_module.apply(envelope, graph_set)          [synchronous; pure Python]
  │     └─▶ for each of 5 graphs: upsert nodes + edges + EWMAs, mark dirty
  │
  ├─▶ nodes, edges = graph_set.drain_all_dirty()
  ├─▶ if (nodes or edges) and _graph_writer:
  │     await _graph_writer(nodes, edges)                   (ES upserts)
  │
  ├─▶ metrics.INGEST_LAST_INGEST_TS.labels(service=…).set(time.time())
  │
  └─▶ try:
        SynthesisEngine().schedule(envelope)                 [M8; fires LLM pass if event type triggers]
      except: log + continue (never wedges ingest)

Why this order: persistence before graph updates before synthesis. If a replay from Elastic ever re-runs extractors over stored observations in arrival order, the resulting graph state will be byte-identical to the original.

Synthesis coalescing

SynthesisEngine.schedule(envelope):
  │
  ├─▶ flavor = _SYNTHESIS_FLAVOR.get(envelope.event_type)
  │   if flavor is None: return None                         (llm_turn, final_response skip)
  │
  ├─▶ self._latest_by_trace[trace_id] = envelope             (always record latest)
  │
  ├─▶ if trace_id in self._in_flight: return None            (already running — follow-up pass
  │                                                           will pick up the newer envelope)
  ├─▶ self._in_flight.add(trace_id)
  └─▶ return asyncio.create_task(self._run_trace(trace_id, flavor))

_run_trace(trace_id, flavor):
  │
  ├─▶ async with self._sem:                                   (bounded concurrency)
  │     envelope = self._latest_by_trace.pop(trace_id, None)
  │     if not envelope: return
  │     await self._synthesize_from_envelope(envelope, flavor)
  │
  └─▶ finally:
        self._in_flight.discard(trace_id)
        if trace_id in self._latest_by_trace:                 (arrived during run → re-queue)
          asyncio.create_task(self._run_trace(...))

Why: a burst of three tool_error events on one trace produces one LLM call, not three. The latest envelope wins — that's the one carrying the most recent state.

The two ingest paths — primary vs secondary

Path Caller Mechanism Auth Used by
Primary (in-process) Oracle's own code (routes.chat, ToolExecutor, mcp/server._proxy) Direct Python function call: oracle.hooks.chat.emit_*(...)ingest.ingest(envelope) None — authenticated at ingress by OAuthMiddleware Every Phase-1 event type from oracle + cortex (parallax deferred)
Secondary (HTTP POST) ApolloClient in axonis-core (admin replay / out-of-process emitters) POST /api/v1/apollo/observations with Bearer token + traceparent header; goes through the FastAPI handler which delegates to ingest.ingest() Bearer token required Admin replay/seed; future services oracle can't observe via MCP

Both paths end on the same queue and the same worker pool. The only difference is how the envelope arrives.

Secondary path auth

POST /api/v1/apollo/observations
  │
  ├─▶ TraceparentMiddleware (sets ContextVar)
  ├─▶ OAuthMiddleware (validates Bearer → request.state.token_payload)
  │
  └─▶ guidance.api.post_observations(request):
        │
        ├─▶ payload = request.state.token_payload
        ├─▶ caller = _caller_identity_from_token(payload)
        │
        ├─▶ body = await request.json()
        ├─▶ batch = ObservationBatch.model_validate(body)     (400 on malformed)
        │
        └─▶ for envelope in batch.observations:
              if envelope.caller_identity.username is None:
                envelope.caller_identity = caller              (stamp from token)
              await ingest.ingest(envelope)

Guidance attachment — the two attach points

L1 attach (/chat response)

When: at the bottom of every routes.chat handler, just before constructing ChatResponse. Who calls: oracle (in-process). How:

oracle.guidance.attacher.for_l1(user=<subject>, intent_class=<class or None>)
  │
  ├─▶ if not APOLLO_GUIDANCE_ATTACH_ENABLED: return None
  │
  ├─▶ await asyncio.wait_for(
  │     oracle.guidance.selectors.match_artifacts(...),
  │     timeout=APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS / 1000)
  │
  ├─▶ on timeout:
  │     metrics.GUIDANCE_ATTACH_TIMEOUT_TOTAL.labels(scope="l1").inc()
  │     return None                                           (response serializes without the field)
  │
  └─▶ return {"as_of": now(), "artifacts": [...], "rationale_summary": "..."}

L3 attach (MCP dispatch)

When: inside _call_backend_tool before the outbound httpx POST, and inside the MCP proxy's _proxy before the dispatch. Who calls: oracle (in-process). How: same shape as L1 but scope-tagged as l3:<service>. Skipped entirely when tool_info.component_kind == "library".

Subscriber consumption

Subscribers don't call Apollo. They call their own local ApolloGuidanceCache.update(payload) when they receive an envelope with apollo_guidance. The cache is a pure in-process data structure; no network, no auth, no transport.

Synthesis trigger graph

Which observation types trigger which LLM flavor:

Event type Flavor When Who triggers Prompt builder
user_prompt intent_pattern L1 sends /chat Drain worker post-graph-update build_intent_pattern_prompt
intent_schema intent_pattern L1 includes schema block Drain worker build_intent_pattern_prompt
tool_output intent_pattern L3 responds successfully Drain worker build_intent_pattern_prompt (strengthens clustering)
tool_error failure_pattern L3 returns an error Drain worker build_failure_pattern_prompt
llm_turn (no trigger) Every LLM cycle inside oracle Too granular; feeds graphs only
final_response (no trigger) Response about to reach L1 Informational; feeds graphs + lineage
user_feedback (future) Admin endpoint or feedback submission Evaluator-only today
Admin-initiated failure_pattern Admin hits POST /learn admin.api.trigger_learn build_failure_pattern_prompt

Why llm_turn and final_response don't trigger: they're intermediate observations. Every request produces exactly one user_prompt → many llm_turn → many tool_output/tool_error → one final_response. Triggering on the high-cardinality ones would LLM-thrash; triggering on the bracket events gives clean once-per-turn cadence.

Drift gating — the four checks

Every LLM proposal passes through drift.run_all(...) before it can become an approved artifact. The call path:

SynthesisEngine._record_proposal(proposal, subgraph)
  │
  └─▶ drift_module.run_all(
        proposal_id=<uuid>,
        proposal=<LLM's JSON output>,
        outcome_graph_edges=subgraph["outcome_graph_edges"],
        intent_graph_nodes=subgraph["intent_graph_nodes"],
        existing_weights=subgraph.get("existing_weights", []),
        trajectory=subgraph.get("trajectory"))
      │
      ├─▶ check_proposed_pattern_vs_edges(...)              # n/a if non-FailurePattern
      ├─▶ check_intent_classification_vs_clusters(...)      # n/a if non-IntentPattern
      ├─▶ check_weight_swings(...)                          # passes if <2 priors
      └─▶ check_trajectory_coherence(...)                   # passes if no trajectory
      │
      └─▶ DriftCheckResult(approved = all passed)

Why deterministic: drift checks must not themselves call the LLM — that would defeat the point. They are pure math against graph state.

What happens on failure: the proposal is still recorded on the pending list, but status="drift_flagged" and drift_checks[*] carries per-check detail so admins can see exactly which anchor was violated. M9's Curator will refuse to commit flagged proposals autonomously (admin review required).

Admin call paths

Every admin endpoint routes through the same require_admin FastAPI dependency.

admin.api.require_admin(request) -> payload
  │
  ├─▶ payload = request.state.token_payload                  (from OAuthMiddleware)
  ├─▶ if "admin" not in payload.roles: raise 403
  └─▶ return payload

Endpoint call graph

Endpoint Handler Calls Returns
GET /memories admin.api.list_memories ApolloObservations().read(query) Filtered observation list
GET /memories/{uid} admin.api.get_memory ApolloObservations().read(uid=uid) One doc or 404
POST /memories admin.api.seed_memory ingest.ingest(envelope) (primary path, admin-stamped caller_identity) {accepted, trace_id, event_type}
PATCH /memories/{uid} admin.api.patch_memory ApolloObservations().update({tags, admin_note}, uid) {patched, uid, fields}
DELETE /memories/{uid} admin.api.forget_memory ApolloObservations().delete(uid) {forgotten, uid}
GET /artifacts admin.api.list_artifacts ApolloArtifacts().read(match_all) + SynthesisEngine().pending_snapshot() {active: [...], pending: [...], count: {active, pending}}
GET /audit admin.api.list_audit ApolloAudit().read(query) with term filters on action/actor/artifact_id + timestamp range Audit records sorted timestamp-desc
GET /recommendations admin.api.list_recommendations RecommendationQueue().snapshot() {recommendations: [...], count: N}
POST /artifacts/{id}/promote admin.api.promote_artifact Fetches proposal from SynthesisEngine._pendingcurator.promote(...) ActionResult.to_dict()
POST /artifacts/{id}/demote admin.api.demote_artifact curator.demote(...) with optional evaluator_score / score_decomposition / upstream_artifact_ids threaded into audit ActionResult.to_dict()
POST /artifacts/{id}/rollback admin.api.rollback_artifact curator.rollback(artifact_id, target_version, ...) ActionResult.to_dict()
PATCH /artifacts/{id} admin.api.edit_artifact curator.edit(artifact_id, content_patch, applicability_patch, ...) ActionResult.to_dict()
DELETE /artifacts/{id} admin.api.forget_artifact curator.forget(artifact_id, ...) ActionResult.to_dict()
GET /guidance?scope=l1 admin.api.preview_guidance attacher.for_l1(...) Current attachable payload for that scope
GET /guidance?scope=l3:<svc> admin.api.preview_guidance attacher.for_l3_agent(service_name=..., tool_name=None) Current L3 payload
GET /subscribers admin.api.list_subscribers SSEHub().subscribers_snapshot() Connected SSE clients
GET /guidance/stream admin.api.guidance_stream SSEHub().subscribe(scope) then streams via sse_event_stream text/event-stream
POST /learn admin.api.trigger_learn SynthesisEngine().schedule_admin_initiated(scope=...) {accepted: true, scope}
POST /chat (admin, action: invoke) chat.server.admin_chat chat.tools.TOOL_IMPLEMENTATIONS[tool](**arguments) Tool result
POST /chat (admin, action: chat) chat.server._run_chat_loop Apollo LLM loop → TOOL_IMPLEMENTATIONS[picked_tool](**args) per iteration → final prose {response, tool_calls, iterations, conversation_id}
POST /chat (admin, action: list_tools) chat.server.admin_chat chat.tools.TOOL_DEFINITIONS Tool catalog for UI rendering
GET /audit filter params (M13) admin.api.list_audit Adds trigger (term) + artifact_type (prefix on artifact_id) Filtered records sorted timestamp-desc
GET /stats (M13 polish) guidance.api.get_stats Returns metric snapshot + degraded_emitters (via maintenance.degraded_emitters()) + intent_schema_coverage (via maintenance.intent_schema_coverage(24)) Status + metrics + degraded list + coverage fraction
POST /observations guidance.api.post_observations ingest.ingest(envelope) per item {accepted, dropped}
GET /stats guidance.api.get_stats metrics.snapshot() Every Prometheus counter's current values

SSE fan-out (driven by Curator commits from M9+)

curator.actions.promote/demote/forget/edit/rollback(...)
  │
  ├─▶ allow_or_raise(ActionRequest)                          (policy gate)
  ├─▶ _copy_current_to_history(artifact_id)                  (prior version → apollo_artifact_history)
  ├─▶ ApolloArtifacts.create(new_record, uid=artifact_id)    (store mutation)
  ├─▶ write_audit(ApolloAuditRecord(...))                    (apollo_audit)
  │
  └─▶ _broadcast_commit(action, artifact_id, actor)
        │
        └─▶ SSEHub().broadcast(
              {event: "curator_commit", action, artifact_id, actor, ts},
              scope="*")
              │
              └─▶ for sub in self._subs.values():
                    if scope matches or sub.scope == "*":
                      sub.queue.put_nowait(event)
                        ↳ if QueueFull: drop (slow consumer reconnects)

Subscribed admin client's streaming response pulls from sub.queue and yields text/event-stream bytes. SSE broadcast failures are swallowed by _broadcast_commit — the durable state (store + audit) is what matters; the SSE feed is cosmetic.

Evaluator call graph (Milestone 10)

Every qualifying observation that the drain worker processes runs through this pipeline before the worker moves on:

drain_worker
  │ (after extractors + graph update)
  │
  └─▶ _evaluate_envelope(envelope)
        │
        ├─▶ AttributionRegistry.applied_for(trace_id, service_name=...)
        │     └─▶ returns [artifact_id, ...] that were attached to this trace
        │         (empty when the attacher never recorded anything for the
        │          trace — common pre-M9 / during bootstrap; skip early)
        │
        ├─▶ signals.detect_signals(envelope, applied_artifact_ids=...)
        │     │
        │     ├─ TOOL_ERROR            → L3_ERROR signals
        │     ├─ TOOL_OUTPUT           → SCHEMA_MISMATCH (agent-only) +
        │     │                          EVALUATOR_CONFIDENCE (if gap > 0)
        │     ├─ USER_FEEDBACK         → USER_FEEDBACK (negative sentiments)
        │     └─ FINAL_RESPONSE + gap → EVALUATOR_CONFIDENCE
        │
        │   Library services are DARK for SCHEMA_MISMATCH (Q9).
        │   component_kind is looked up via ServiceRegistry at signal time.
        │
        └─▶ for each SignalHit:
              ├─▶ ScoringEngine.apply_signal(artifact_id, kind, magnitude)
              │     └─▶ EMA pull toward per-tier asymptote
              │         increments signal_counts / magnitude_totals
              │         tracks l3_dominant_ticks + ticks_below_threshold
              │         appends to l3_timestamps ring buffer
              │
              ├─▶ cascade.cascade_on_l3_dominant(engine, artifact_id)
              │     └─▶ CascadeOutcome with action in
              │         {none, drift_event, recommend_fast_demote, recommend_demote}
              │         and `upstream_flagged` list
              │
              └─▶ if outcome.action != "none":
                     RecommendationQueue.add(
                       Recommendation(
                         artifact_id,
                         kind={"recommend_demote": "demote",
                               "recommend_fast_demote": "fast_demote",
                               "drift_event": "drift_event"}[action],
                         reason=outcome.reason,
                         evaluator_score=score.score,
                         score_decomposition=score.decomposition(),
                         upstream_artifact_ids=outcome.upstream_flagged,
                       ))
                     audit.info("oracle.evaluator.recommendation ...")

Queue replace-semantics. A fresh recommendation for an artifact replaces any prior recommendation for the same artifact — the queue holds the latest high-water snapshot, not an append log. Admins always see the current score, not a stale one.

Cleared by action. When an admin calls curator.demote() with a recommendation's evaluator fields, the demote action removes the recommendation from the queue. No duplicate recommendations appear after the admin acts.

Admin-chat LLM loop (Milestone 11)

Natural-language admin prompt → LLM picks tool → tool runs → LLM reads result → LLM composes final prose. The loop is bounded at _MAX_CHAT_ITERATIONS = 6 so a misbehaving LLM can't thrash.

POST /chat {"action": "chat", "message": "why did you demote pshim_xyz?"}
  │
  └─▶ chat.server._run_chat_loop(message, actor)
        │
        ├─▶ messages = [{"role": "user", "content": "Tool catalog:\n... Admin message: why did you demote pshim_xyz?"}]
        │
        └─▶ for iteration in range(_MAX_CHAT_ITERATIONS):
              │
              ├─▶ LLMClient.get().complete(system=_SYSTEM_PROMPT, messages, response_format="json")
              │     └─▶ returns LLMResponse with parsed JSON
              │
              ├─▶ parsed["action"] == "call_tool":
              │     ├─▶ impl = TOOL_IMPLEMENTATIONS[parsed["tool"]]
              │     ├─▶ result = impl(actor="admin:<username>", **parsed["arguments"])
              │     │     │   (mutations raise CuratorPaused if pause flag is on →
              │     │     │    caught and surfaced as {"ok": False, "error": "curator_paused"})
              │     ├─▶ tool_trail.append({"tool": ..., "result": result})
              │     ├─▶ messages.append(assistant_turn + tool_result_turn)
              │     └─▶ continue loop
              │
              ├─▶ parsed["action"] == "respond":
              │     └─▶ return {"response": parsed["text"], "tool_calls": tool_trail, ...}
              │
              └─▶ malformed JSON → append nudge, try again until budget exhausts

Every mutation picked by the LLM flows through the M9 Curator atomic sequence (policy gate → history → store → audit → SSE broadcast), using trigger="admin_chat" on the audit record to distinguish chat-driven actions from direct REST calls.

Curator pause gate (Milestone 11)

oracle.curator.pause._state: PauseState
  │
  ├─▶ set_paused(actor, rationale)
  │     ├─▶ writes ApolloAuditRecord(action=PAUSE_CURATOR, indefinite=True)
  │     └─▶ SSEHub().broadcast({event: curator_paused, by, reason, ts}, scope="*")
  │
  ├─▶ clear_paused(actor, rationale)
  │     ├─▶ writes ApolloAuditRecord(action=RESUME_CURATOR, indefinite=True,
  │     │     evidence_ref={prior_paused_by, prior_paused_reason})
  │     └─▶ SSEHub().broadcast({event: curator_resumed, by, ts}, scope="*")
  │
  └─▶ raise_if_paused()
        └─▶ called at the TOP of every mutation in apollo/curator/actions.py
            plus rollback_graph and trigger_synthesis in apollo/chat/tools.py
            raises CuratorPaused(state) when flag is on

Mutation coverage. promote, demote, forget, edit, rollback (curator); rollback_graph, trigger_synthesis (chat tools). resume_curator itself is intentionally NOT gated — that's how admins get out of pause.

Autonomous Curator sweep (Milestone 12)

A background task started from oracle.app.startup runs every APOLLO_CURATOR_AUTO_INTERVAL_SEC (default 30) seconds. On each tick:

oracle.curator.auto.sweep_once()
  │
  ├─▶ if not settings.APOLLO_CURATOR_AUTONOMOUS: return {ran: False, reason: "disabled"}
  ├─▶ if is_paused(): return {ran: False, reason: "paused"}
  │
  ├─▶ for rec in SynthesisEngine().pending_snapshot():
  │     if rec.status == "approved":
  │       artifact_id = derive_artifact_id(rec.proposal)
  │       ├─▶ fp_<hash> for FailurePattern, ip_<hash> for IntentPattern,
  │       │   ps_<hash> for PromptShim, sf_<hash> for SpecFragment
  │       │   — same proposal body → same id (versioning accumulates)
  │       └─▶ curator.promote(artifact_id=..., actor="curator_auto",
  │                            trigger="autonomous_curator", evidence_ref={autonomous: True})
  │           └─▶ same M9 atomic sequence — raise_if_paused → policy_gate →
  │               history → store → audit → SSE broadcast
  │     else rec.status == "drift_flagged":
  │       drift_retained += 1  (admin must review)
  │
  └─▶ for queued in RecommendationQueue().snapshot():
        if queued.kind in ("demote", "fast_demote"):
          curator.demote(artifact_id=..., actor="curator_auto",
                         trigger="autonomous_curator",
                         evaluator_score, score_decomposition,
                         upstream_artifact_ids)
        elif queued.kind == "drift_event":
          drift_retained += 1  (admin must review)

Evolution-class vs drift-class. Evolution work (approved proposals + demote/fast_demote recommendations) auto-commits because it's mechanical — the drift check already validated the proposal; the evaluator already quantified the regression. Drift-class work (drift_flagged proposals + drift_event recommendations) is explicitly retained for admin review because it reflects a divergence the autonomous path shouldn't resolve on its own.

Mid-sweep pause. If an admin pauses between the sweep's flag check and one of its mutations, the individual promote / demote call raises CuratorPaused; sweep_once catches it and returns reason: "paused_mid_sweep" with whatever it committed up to that point. No split-brain state.

Maintenance loop (Milestone 13)

A second background task from oracle.app.startup runs every APOLLO_MAINTENANCE_INTERVAL (default 1 hour):

oracle.maintenance.run_once()
  │
  ├─▶ _purge_expired()
  │     └─▶ for alias in (apollo_observations, apollo_audit, apollo_graph_snapshots):
  │           store.delete_by_query({"range": {"expires_ts": {"lt": now}}})
  │           — indefinite records have expires_ts=null → skipped by the range query
  │
  ├─▶ _coarsen_snapshots()  # tier generation deferred; hook in place
  │
  └─▶ _emit_metrics()
        ├─▶ MAINTENANCE_LAST_RUN_TS.set(now)
        └─▶ MAINTENANCE_DOCS_DELETED_TOTAL.labels(index=alias).inc(n)

/stats read-side helpers:

oracle.maintenance.degraded_emitters()
  └─▶ scan metrics.INGEST_LAST_INGEST_TS samples
      flag any service whose timestamp is > APOLLO_INGEST_STALE_WARN_SEC old
      return [{"service", "last_ingest_ts", "seconds_since"}, ...]

oracle.maintenance.intent_schema_coverage(window_hours=24)
  └─▶ count observations in the window; count intent_schema within them
      return count_intent / count_all (None when window is empty)

Curator mutation atomic sequence

Every action in oracle.curator.actions follows the same five-step shape:

  1. allow_or_raise(request)                  # policy gate — fails closed
  2. _copy_current_to_history(artifact_id)    # prior → apollo_artifact_history
  3. ApolloArtifacts.create(new_record)       # overwrite current with new version
  4. write_audit(ApolloAuditRecord(...))      # audit record with required rationale
  5. _broadcast_commit(action, ...)           # SSE fan-out to admin subscribers

If step 1 raises, nothing downstream runs. If any later step fails, the partial state is what operators will see on GET /artifacts + GET /audit — but the policy gate guarantees that a blocked action leaves zero durable state. In practice, steps 2–4 are tightly coupled and written atomically against the same event loop iteration; an ES outage at step 3 will still have written step 2's history record (acceptable — history is designed to accumulate).

Trace propagation — who sets, who reads

Set (ingress)

TraceparentMiddleware:
  │
  ├─▶ header = scope["headers"]["traceparent"]
  ├─▶ ctx = parse_traceparent(header)
  │
  ├─▶ if ctx is None:
  │     if APOLLO_REQUIRE_TRACEPARENT: return 400
  │     if header: metrics.MALFORMED_TRACEPARENT_TOTAL.inc()
  │     else:     metrics.MISSING_TRACEPARENT_TOTAL.inc()
  │     ctx = mint_traceparent()                              (local mint)
  │
  └─▶ set_current_traceparent(ctx.format())

Read (outbound)

Every place that needs the trace reads the same ContextVar:

Call site Purpose
routes.chat Pass trace_id to emit_user_prompt / emit_final_response
ToolExecutor emit loop Pass trace_id to emit_llm_turn / emit_tool_output / emit_tool_error
_call_backend_tool headers Add traceparent header to outbound httpx POST
mcp/server._proxy headers Same
extract_http_headers Forward to any gateway client (MCPClient / RestClient)
ApolloClient._post_with_retry Stamp traceparent header on secondary-path POSTs

Every envelope oracle emits for one request carries the same trace_id. That's what lets the admin lineage query stitch observations across layers.

Logging + auditing — three channels

Every Apollo module imports these three loggers from axonis.core.logger:

Logger Purpose Rotating file
log Routine telemetry (info, warning, debug) oracle.log
error Exceptions, permanent failures, data-loss events error.log
audit Important transactions that must be independently traceable audit.log

What counts as audit.info():

Event Emitted from
oracle.admin.memory_seeded / memory_patched / memory_forgotten M7 admin mutations
oracle.admin.learn_requested M8 POST /learn
oracle.chat.list_tools / oracle.chat.invoke M7 admin chat
oracle.synthesis.admin_initiated M8 admin-driven synthesis pass
oracle.synthesis.proposal_recorded M8 every proposal (approved + drift_flagged)
oracle.llm.minimax_local_loaded M8 first successful load of the local HF checkpoint
oracle.curator.audit action=<kind> actor=... artifact=... M9 every Curator mutation (promote / demote / forget / edit / rollback); one line per audit record written
oracle.curator.promoted / .demoted / .forgotten / .edited / .rolled_back M9 mutation-kind-specific info lines (operational telemetry, not primary audit)
oracle.evaluator.recommendation artifact=<id> kind=<demote\|fast_demote\|drift_event> score=<float> reason=<str> M10 — every evaluator recommendation landing on the queue
oracle.chat.session_start actor=admin:<name> conv=<id> / session_end / session_timeout M11 — admin chat session lifecycle
oracle.chat.tool_call actor=admin:<name> tool=<name> iter=<n> M11 — every tool the LLM picks inside a chat loop
oracle.chat.trigger_synthesis / .rollback_graph / .list_tools / .invoke M11 — admin-chat operation audit lines
oracle.curator.audit action=pause_curator \| resume_curator artifact=curator:state M11 — pause/resume indefinite audit records
oracle.curator_auto.promoted artifact=<id> version=<n> proposal=<id> M12 — every autonomous promote
oracle.curator_auto.demoted artifact=<id> kind=<demote\|fast_demote> score=<float> M12 — every autonomous demote from evaluator recommendation
oracle.curator_auto.sweep promoted=<n> demoted=<n> drift_retained=<n> M12 — per-sweep summary (only logged when anything changed)
oracle.maintenance.completed purged=<dict> coarsened=<dict> M13 — every maintenance pass

Telemetry — who increments what

All counters are registered at startup with zero values (so dashboards work before traffic). Every counter has labels by service/event type/kind where applicable.

Counter Incremented by When
apollo_ingest_accepted_total{service, event_type} ingest.ingest() Envelope successfully enqueued
apollo_ingest_queue_dropped_total{service} ingest.ingest() Queue full on put_nowait
apollo_ingest_dedupe_total{service} Drain worker Observation inside dedup window
apollo_ingest_queue_depth Drain worker Updated after every get
apollo_ingest_last_ingest_ts{service} ingest.ingest() On successful enqueue (both paths)
apollo_ingest_last_drain_ts{service} Drain worker After graph update step
apollo_ingest_post_failure_total{service, kind} ApolloClient._post_with_retry Secondary-path POST failed after retries
apollo_ingest_worker_failure_total{service} Drain worker Write/graph path failed after retries
apollo_guidance_attach_timeout_total{scope} attacher.for_l1 / for_l3_agent Attach budget overshoot
apollo_missing_traceparent_total TraceparentMiddleware No header on inbound request
apollo_malformed_traceparent_total TraceparentMiddleware Header present but unparseable

M9 additions — the Curator does not introduce new Prometheus counters in M9; visibility into mutations is through the audit log + SSE feed + admin endpoints.

M13 additions:

Counter Incremented by When
apollo_maintenance_last_run_ts maintenance._emit_metrics End of each maintenance pass
apollo_maintenance_docs_deleted_total{index} maintenance._emit_metrics After each delete_by_query sweep

Counters are scraped through GET /api/v1/apollo/stats (JSON) and /api/v1/oracle/metrics (Prometheus text format via oracle's existing endpoint).

Failure modes — what degrades how

Failure Apollo effect L1 / L3 effect
Ingest queue full apollo_ingest_queue_dropped_total increments; observation lost None — request continues
Drain worker crashes mid-write At-least-once retry; after budget, dead-letter JSONL (optional) + counter None — request already returned
Apollo module import fails at startup Oracle serves /chat without apollo_guidance on the response; MCP dispatches go out without the guidance field; POST /observations returns 503 L1 sees response without the optional field; L3 gets no guidance but otherwise runs normally
attacher.for_l1 / for_l3_agent exceeds timeout apollo_guidance_attach_timeout_total increments; field omitted from response/dispatch Same — clients ignore missing optional field
LLM call fails inside synthesis Proposal not recorded; error.error("oracle.synthesis.trace_failed") logged None — synthesis is background work
LLM returns malformed JSON Proposal dropped silently from pending list; error.error("oracle.synthesis.bad_json") None
Drift check flags proposal status="drift_flagged" on pending record; admin sees it under GET /artifacts None
OAuthMiddleware rejects token 401 returned Standard oracle behavior
TraceparentMiddleware in required mode + missing header 400 returned before handler L1 must include traceparent to proceed

Invariant: no failure in Apollo can reach the user's /chat response path with anything worse than "the optional apollo_guidance field isn't there."

What's not wired yet (by milestone)

Feature Unblocks Milestone
Curator persists approved proposals to apollo_artifacts with version history Artifact-driven guidance instead of empty sets ✅ M9
Admin PATCH /artifacts/{id} / promote / demote / rollback Admin can act on Apollo's proposals ✅ M9
apollo_audit records with rationale + evidence_ref Audit review surface ✅ M9
SSE fan-out on Curator commits Live admin visibility into mutations ✅ M9
Evaluator scoring + L3-amplified demotion Artifacts decay when they stop correlating ✅ M10
GET /api/v1/apollo/recommendations for admin review of evaluator verdicts Admin can see which artifacts the evaluator wants demoted ✅ M10
Demote audit records carry evaluator_score + score_decomposition + upstream_artifact_ids Audit trail explains why each demote happened ✅ M10
Admin-chat LLM loop with function-calling-style tool selection Conversational admin surface ✅ M11
Full admin tool catalog (promote/demote/rollback/forget/edit/graph rollback/synthesis trigger/pause/resume) Every admin mutation reachable via chat ✅ M11
explain_decision / discuss_decision surface audit + evidence in-chat Admin can ask "why did you do this?" in plain English ✅ M11
Curator pause gate on every mutation Emergency off-switch ✅ M11
Autonomous Curator auto-commit loop (evolution-class) Curator commits without admin intervention ✅ M12
Deterministic artifact-id derivation for auto-promoted proposals Versioning accumulates on one artifact instead of proliferating ✅ M12
Hourly maintenance + delete_by_query purge Expired observations/audits cleaned up ✅ M13
/stats degraded_emitters + intent_schema_coverage Phase-3 readiness (required-mode flip substrate) ✅ M13
GET /audit extended with trigger + artifact_type filters Admin can narrow by mutation origin and artifact kind ✅ M13
Cortex (L3) consumes apollo_guidance via request-scoped ApolloGuidanceCache L3 LLM calls inside cortex tools observably change when guidance is attached ✅ M14
apollo_guidance popped from MCP arguments before tool dispatch; contextvar reset on return / exception No cross-request leakage of cache state ✅ M14
Beacon (L1) consumes apollo_guidance via session-scoped ApolloGuidanceCache L1 LLM prompts include attached PromptShims / FailurePatterns / etc. ⏸ Deferred — gated on a beacon↔oracle connection (no path today)
Parallax onboarding (subscriber + emitter pattern, mirrors cortex) Parallax-driven workflows visible to + steered by Apollo ⏸ Deferred — same pattern as cortex when it onboards
APOLLO_LLM_LOCAL_MODEL_PATH + thread-pool offload + device mapping for minimax-local Production-grade local inference Deferred (post-M14)

Design Journey

A human-readable walk-through of Apollo's design and what each completed milestone delivered. Written for presentation audiences, not as a reference spec. The full technical contract is the rest of this spec; the build order lives in §Implementation Plan. This section is a narrative: what we are trying to accomplish, how the pieces fit together, and the order in which they came online.

What Apollo is

Apollo is an observation, learning, and guidance layer that lives inside oracle. It watches every request/response flowing through the platform, records what actually happened, reasons about what is working and what isn't, and feeds that reasoning back into the system as guidance — attached to the next response or dispatch, so the guidance reaches the LLMs that need it at the exact moment they need it.

Three goals:

  1. Observe. Record every meaningful event in the platform with enough lineage that an operator or an automated evaluator can reconstruct "what happened on this request" days or weeks later.
  2. Ground. Turn that stream of events into deterministic graphs — no LLM reasoning, just accounting — so the system has an objective ledger of reality.
  3. Advise. Reason over the graphs (with an LLM, bounded by graph-anchor drift checks) to propose improvements to the prompts, tool routing, failure handling, and intent classifications the platform uses — then attach those improvements to the next turn.

Apollo is internal — it has no external surface. Oracle already fronts the platform; Apollo runs as a package inside oracle.

The architectural invariant (the single rule)

The question that drove the most design iteration was: who talks to whom? The answer:

Neither L1 nor L3 ever addresses Apollo directly. Both talk to oracle. Oracle talks to Apollo.

This is invariant #14 in the design spec. Every piece of the system — ingest, guidance delivery, synthesis, admin tooling — respects it. Here is the full flow in six steps:

  1. L1  ──────────────▶  Oracle          (user's /chat request)
  2.                      Oracle  ──▶  Apollo   (in-process emit: user_prompt, intent_schema)
  3.                      Oracle  ◀──  Apollo   (in-process: apollo_guidance payload)
  4.                      Oracle  ──────────────▶  L3      (MCP dispatch + apollo_guidance)
  5.                      Oracle  ◀──────────────  L3      (tool response)
                          Oracle  ──▶  L1               (/chat response + apollo_guidance)
  6.                      Oracle  ──▶  Apollo   (in-process emit: tool_output / tool_error / final_response)

Consequences of this rule:

  • Neither L1 (beacon, browser clients) nor L3 (cortex; parallax in a later phase) holds any Apollo credentials, endpoint knowledge, or client code.
  • ApolloClient (the HTTP emitter in axonis-core) exists but is reserved for admin replay/seed and out-of-process emitters — services running outside oracle's MCP-dispatch reach. Phase-1 emitters (oracle + cortex) never use it.
  • Apollo cannot be "down" independently of oracle — they share a process. If Apollo fails, oracle degrades gracefully (responses serialize without apollo_guidance); if oracle is down, Apollo is moot anyway.

The layered view

┌──────────────────────────────────────────────────────────────┐
│  Layer 1 (L1): Front-facing UI / clients                     │
│    e.g., beacon, browser clients                             │
│    - composes prompts, presents responses                    │
│    - consumes apollo_guidance attached to /chat responses    │
│    - never talks to Apollo                                   │
└────────────────────┬─────────────────────────────────────────┘
                     │  /chat (HTTP + traceparent)
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 2 (L2): Oracle + Apollo                               │
│    Oracle: auth, routing, LLM dispatch, tool aggregation,    │
│            guidance attachment                               │
│    Apollo: observe, ground (graphs), advise (LLM synthesis), │
│            curate (admin-driven today; autonomous later)     │
└────────────────────┬─────────────────────────────────────────┘
                     │  MCP tool calls + apollo_guidance
                     ▼
┌──────────────────────────────────────────────────────────────┐
│  Layer 3 (L3): Backend agents + libraries                    │
│    agents: parallax, cortex (own their own LLM)              │
│    libraries: UDS, athena (no LLM, pure compute/IO)          │
│    - agents consume apollo_guidance from MCP arguments       │
│    - libraries receive no guidance (oracle filters)          │
│    - never talks to Apollo — oracle observes the round-trip  │
│      and emits on each service's behalf                      │
└──────────────────────────────────────────────────────────────┘

Milestone journey

The build is ordered so every milestone ships a coherent, merge-ready slice. Oracle remains fully functional throughout — Apollo is additive. Each milestone below records what shipped, what it proved, and why it matters; the canonical build-order contract for the same milestones is in §Implementation Plan.

Milestone 0 — Package scaffolding

What shipped. The oracle/oracle/ directory tree with stub modules matching the design spec's package structure. Apollo mounted into oracle's Starlette app at /api/v1/apollo/*. A single live endpoint: GET /api/v1/apollo/stats returning a bootstrap placeholder. Dependency additions (sentence-transformers, numpy) landed.

What it proved. Apollo can be loaded, mounted, and reached without breaking any pre-existing oracle functionality.

Why it matters. Everything later in the plan mounts on this scaffolding. By separating "wire the skeleton" from "build the brain," each subsequent milestone is a narrow PR instead of a sprawling rewrite.

Milestone 1 — Observation intake

What shipped. The primary in-process ingest path: oracle.oracle.observer.ingest.ingest(envelope) validates a Pydantic envelope and drops it on a bounded asyncio.Queue. A pool of background workers drains the queue, writes to Elasticsearch's apollo_observations index, and dedupes by (trace_id, event_type, timestamp, service) across a configurable window. A secondary HTTP path — POST /api/v1/apollo/observations — wraps the same queue so admin replay and future out-of-process emitters have a route. ApolloClient in axonis-core handles the secondary-path client side (batching + retry + flush on shutdown).

What it proved. Apollo can receive observations from oracle's internals and from the network, enqueue them without blocking the request path, and persist them durably. A 50-envelope batch returns 202 in under 10 ms; queue-full and worker-crash paths both increment counters rather than silently drop.

Why it matters. The whole rest of Apollo sits on this pipe. Ground truth has to arrive reliably before anything can reason about it.

Milestone 2 — Deterministic graph updates

What shipped. Five Decision Graphs backed by apollo_graph_nodes and apollo_graph_edges: intent_tool, prompt_shape, service_routing, outcome, and iteration. Each observation passes through rule-based extractors that produce (nodes_touched, edges_touched); the graph module upserts idempotently and updates short- and long-window EWMA weights. Hourly snapshots land in apollo_graph_snapshots. An in-memory mirror rebuilds from Elastic on startup so the hot path never hits ES.

What it proved. Every observation produces graph mutations with no LLM call. The math is deterministic — 1,000 synthetic observations reproduce byte-identical graph state on replay. EWMA weights converge to the expected values for a known sequence.

Why it matters. This is the grounding layer. When Apollo's LLM proposes "there's a new failure pattern in parallax's fusion_run_start tool," the graph-anchor check (Milestone 8) validates that claim against observed reality rather than letting the LLM hallucinate.

Milestone 3 — Guidance attach plumbing

What shipped. The apollo_guidance payload shape and the two places oracle attaches it:

  • L1 path: oracle's /chat response body carries apollo_guidance for every authenticated caller. Placeholder — see note below.
  • L3 path: oracle's MCP dispatches to agent-kind services carry apollo_guidance inside the arguments dict, same pattern as the existing llm_spec injection. library-kind services are filtered out.

The component_kind field was added to ServiceRegistry so the dispatch path can route on it. An in-process attacher returns {as_of, artifacts, rationale_summary} bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms); on overshoot the field is simply omitted and the request succeeds.

What it proved. The delivery channel works end-to-end with an empty artifact set. /chat responses carry the field; MCP dispatches carry it; library dispatches don't; attach-timeout paths degrade cleanly.

Why it matters. Symmetric piggybacking — guidance rides the envelopes that were already travelling. No push transport, no long-lived connection, no service-token infrastructure. When synthesis (M8) starts producing real artifacts, the L3 pipe is fully wired (cortex consumes via M14) and the L2 pipe is in-process (oracle's chat LLM consumes via M15 — no transport since oracle hosts Apollo). The L1 attach side is wired on oracle's response; the L1 consumer side (beacon) waits on the L1 ↔ Oracle connection design.

Milestone 4 — Subscriber SDK (ApolloGuidanceCache)

What shipped. A pure-Python in-process cache that L1 and L3 agents use to consume attached guidance. Single mutation API (update(apollo_guidance_block)) and six canonical accessors (get_system_prompt_additions, get_spec_fragments, get_tool_description_overrides, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints). Artifacts ordered by (weight desc, recency desc). Empty-cache fallback returns empty lists / None without blocking.

Lives in axonis-core/axonis/core/apollo/guidance_cache.pyno HTTP client, no transport, no ML dependency — so SPEC-01's "axonis-core has no ML dependencies" rule is preserved.

What it proved. Any agent — L1 UI, L3 agent service — can consume guidance through a single small class with no knowledge of Apollo's internals. Idempotent update, stable ordering, applicability filtering, clean empty-cache behavior.

Why it matters. The delivery protocol is one-way (Apollo → subscriber) and the SDK reflects that — it's a data sink with read accessors, nothing more. Swapping Apollo out entirely would mean the cache stops updating; agents would see empty results and continue working with pre-Apollo behavior.

Milestone 5 — Phase-1 emitter integration (oracle-sole-observer)

What shipped. Oracle became Apollo's sole emitter for every Phase-1 event type. The L3 emission path was redesigned here: cortex carries no Apollo emission code. Oracle observes each MCP round-trip it dispatches to cortex and emits tool_output / tool_error in-process on its behalf. L1 events (user_prompt, final_response, intent_schema, user_feedback) continue to be emitted by oracle in-process from the /chat handler. Oracle's own llm_turn events fire from the tool-executor loop. Parallax was originally part of Phase 1 but has been deferred to a later phase; its emitter and subscriber wiring follow the cortex pattern when it onboards.

Cortex carries exactly one Apollo-facing change at M5: its /service-info registration declares "component_kind": "agent" so oracle knows to attach apollo_guidance on its dispatches. Subscriber-side consumption of that attached guidance lands later, in M14.

What it proved. A full /chat request produces a lineage of observations all emitted by oracle under a single trace_id. Parallax and cortex source trees contain zero imports of ApolloClient or ApolloIntegration and zero lifespan wiring for Apollo.

Why it matters. This operationalized the architectural invariant. Apollo's surface shrinks — fewer client integrations, no service tokens, no per-service emission code to maintain across fleets. Adding a new L3 agent takes one line on its ServiceRegistry record (the component_kind declaration); oracle handles the rest.

Milestone 6 — Trace propagation (W3C traceparent)

What shipped. End-to-end trace stitching via the W3C traceparent header. Four pieces:

  • Canonical module (axonis/core/trace.py) in axonis-core with a parser, minter, ContextVar, and current_trace_id() accessor. Pure Python, ~20 lines of parsing, no OpenTelemetry dependency.
  • Ingress middleware (TraceparentMiddleware) sits outside OAuthMiddleware on oracle's app. Reads the header, validates it, installs the ContextVar. On missing → mints + increments apollo_missing_traceparent_total. On malformed → mints + increments apollo_malformed_traceparent_total. Required mode (APOLLO_REQUIRE_TRACEPARENT=true) returns 400 on either.
  • Propagation helperaxonis_core.gateway.client.extract_http_headers() now forwards traceparent alongside Authorization. Any client that uses this helper (MCPClient, RestClient) inherits forwarding for free.
  • Outbound injection — oracle's tool_executor.py and mcp/server.py use httpx directly, so they explicitly add the traceparent header on every outbound MCP POST.

What it proved. A single /chat request produces observations across every Phase-1 boundary — L1 user_prompt, L2 llm_turn, L3 tool_output / tool_error, L2 final_response — all under the same trace_id. The wire path is L1 → Oracle → L3; Apollo is never on the wire.

Why it matters. Lineage stitches. An operator investigating a failure can follow one request all the way through every layer by querying on a single id. The foundation for the admin inspection surface (M7) and the evaluator's outcome correlation (M10).

Milestone 7 — Admin inspection surface

What shipped. Ten admin-only endpoints mounted under /api/v1/apollo/:

  • Memory CRUD: list observations with filters, get one, seed synthetic, patch metadata (tags + admin_note), forget.
  • Stubs with stable shapes: GET /artifacts (empty until M8), GET /audit (empty until M9).
  • Guidance preview: GET /guidance?scope=l1 or ?scope=l3:<service> — returns what oracle would currently attach for that scope.
  • SSE debug feed: GET /guidance/stream?scope=... — live subscribers see every Curator commit (empty today; M9 starts writing them).
  • Subscriber registry: GET /subscribers.
  • Read-only admin chat: POST /chat with action: list_tools or action: invoke — LLM-less for M7, wired to the real LLM loop in M8.

Every endpoint gates on role == "admin" via a shared FastAPI dependency. Non-admin callers see 403. All mutation actions (seed, patch, delete) are audit-logged with the admin's username.

What it proved. Operators can inspect every observation Apollo has recorded, preview what guidance would currently attach, subscribe to the live debug feed, and exercise admin tooling via a chat surface — all before any autonomous Apollo behavior is enabled. The principle "admin must be able to see before Apollo is allowed to change" is now operational.

Why it matters. This is the gate that has to sit in front of every autonomous action that comes later. M9 (curator commits), M11 (admin chat mutations), M12 (autonomous curator) all route through inspection tools that landed here.

Milestone 8 — LLM synthesis + graph-anchor drift check

What shipped. Apollo's own LLM comes online. Four pieces:

  • LLM client (oracle/oracle/llm.py). Three providers, pluggable by env:
  • openai (production default) — uses the existing openai SDK against any OpenAI-compatible endpoint (MiniMax hosted, Anthropic via a proxy, a local vLLM, etc.). No new dependency.
  • minimax-local (scaffolded) — lazy HuggingFace transformers load of the stock MiniMax checkpoint using the canonical model-card signature. Weights resolve from the standard HF cache. For air-gapped clusters and operator-owned GPU inventory. Production-hardening knobs (APOLLO_LLM_LOCAL_MODEL_PATH, thread-pool offload, device mapping) are documented and deferred.
  • stub — canned responses for deterministic tests.
  • Prompt templates (oracle/oracle/learner/prompts.py). One builder per synthesis flavor: failure-pattern extraction (fires on tool_error bursts), intent-pattern clustering (fires on user_prompt / intent_schema), prompt-shim proposal (admin-initiated). Every template demands strict JSON output.
  • Synthesis dispatcher (oracle/oracle/learner/synthesis.py). Event-driven. Fires from the ingest worker whenever a triggering event type lands. Trace-id coalescing: a burst of three tool_error events on the same trace collapses to one LLM call. Bounded concurrency: APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) semaphore. Also exposed via POST /api/v1/apollo/learn for admin-initiated passes.
  • Graph-anchor drift check (oracle/oracle/learner/drift.py). Four deterministic sub-checks every LLM proposal passes through before it can become an approved artifact:
  • Pattern-vs-edges — does the proposed FailurePattern reference an error edge actually present in the outcome graph?
  • Intent-vs-clusters — does the proposed IntentPattern's class match an existing intent cluster?
  • Weight swings — is the proposed weight within the z-score threshold of the existing weight distribution?
  • Trajectory coherence — does the proposal's implied direction of change align with the EWMA trajectory?

Any failing check flags the proposal as a DriftEvent instead of approving it. No LLM involved in the check itself — purely math against graph state.

Approved proposals and drift events land on an in-memory pending list visible through GET /artifacts. M9 (Curator commits) will persist them to apollo_artifacts with versioning.

What it proved. A synthetic sequence of tool_error observations triggers one LLM call (coalesced across the burst); the LLM's JSON proposal passes through all four drift checks; consistent proposals become approved and show up on /artifacts immediately. Unsupported proposals get drift_flagged with per-check detail preserved so admins can see exactly why they were blocked.

Why it matters. This is the line where Apollo stops being a recorder and starts being an advisor. The graph-anchor principle is the critical piece: the graphs keep the LLM honest. Apollo can't hallucinate a failure pattern that no error edges support, and it can't invent an intent class no observations have clustered around.

Milestone 9 — Curator commits + versioning + audit

What shipped. Apollo can now mutate state. The Curator is the one subsystem empowered to persist artifact changes, and every mutation lands through the same atomic sequence: policy gate → history write → store update → audit write → SSE broadcast. Five actions: promote, demote, forget, edit, rollback. Five admin HTTP endpoints expose them; every endpoint gates on role == "admin".

Three new Elasticsearch indices back this: apollo_artifacts (current version of every artifact), apollo_artifact_history (every prior version, indefinite retention), apollo_audit (the Curator audit log). Retention is configurable via APOLLO_AUDIT_RETENTION_DAYS (default 90); the forget and rollback actions write indefinite: true records that are never purged.

The policy gate refuses six hard invariants per SPEC-14 §Curator → Disallowed actions: mutating auth / guardrails / token state, widening a caller's tool access, touching another user's conversation data, calling backend services on a user's behalf, or modifying / deleting audit records. Every action passes through the gate as its first step — if the gate raises, no downstream write happens.

Rationale is required non-empty. Pydantic validation at the model layer rejects blank rationales before any I/O, because audit review is the primary substrate for admin chat (§Rationale and evidence). A blank rationale defeats the whole surface.

Rollback provenance. When an admin rolls back v3 to v1's content, the Curator writes a new v4 whose content matches v1 but whose prev_version_id points at v3 (the pre-rollback current). The provenance chain stays linear — you can always trace "v4 came from rolling back v3 to v2's content" without special case handling in the audit reader.

SSE fan-out. Every successful Curator action broadcasts a curator_commit event to every admin watching the SSE debug feed (the channel that landed empty in M7). Admins see mutations as they happen; production ops can tail the stream during a synthesis burst to confirm nothing unexpected is being promoted.

Autonomous mode stays off. Per spec, M9 ships Curator in admin-triggered only mode — every mutation requires a human kicking it off through one of the admin endpoints. M12 flips the APOLLO_CURATOR_AUTONOMOUS switch that lets the Curator commit evolution-class proposals on its own.

What it proved. A full promote → edit → rollback → forget lifecycle works end-to-end. Every mutation is versioned, audited, policy-gated, and fan-out-broadcast. Failed actions (policy violations, nonexistent targets, reserved artifact-id namespaces) leave zero durable state behind. The four drift anchors from M8 continue to gate what can be promoted; drift_flagged proposals can't slip past the promote endpoint.

Why it matters. This is the transition from "Apollo watches and proposes" to "Apollo's proposals become persistent state." Before M9 the system could surface suggestions; after M9 admins can act on them with full version history and audit trails. The admin chat empowerment work in M11 and the autonomous Curator in M12 both mount on top of this — they are refinements to who can pull the trigger, not to what happens when the trigger is pulled.

Milestone 10 — Evaluator scoring + L3-performance amplification

What shipped. Apollo can now decay artifacts that stop working. Every active artifact carries a rolling EMA score; every observation on a trace that carried the artifact nudges the score; drops driven by L3-performance signals move the score faster than drops driven by user feedback or evaluator confidence. When an artifact's score stays below the 0.5 demote threshold long enough, the Evaluator writes a demotion recommendation; admins see it on GET /api/v1/apollo/recommendations and can act through the M9 mutation endpoints.

Five new modules under apollo/evaluator/:

  • signals.py — four failure-signal detectors, one per observation flavor:
  • L3_ERRORtool_error envelopes (magnitude 1.0, both agent- and library-observed).
  • SCHEMA_MISMATCHtool_output whose output dict is missing fields the L1 intent schema required. Library-dark: only fires when the observed service's component_kind == "agent". Libraries have no agent-level intent contract; the check is skipped for them per SPEC-14 Q9.
  • USER_FEEDBACKuser_feedback envelopes with sentiment in ("correction", "down", "abandoned"), at magnitudes 1.0 / 0.7 / 0.4 respectively.
  • EVALUATOR_CONFIDENCE — graph-anchor confidence gaps continuous in [0, 1].

  • scoring.py — per-artifact rolling EMA with weight tiers per SPEC-14 §Evaluator:

Signal Default weight Sustained asymptote
L3 error 3.0 0.0 (demotable)
Schema mismatch 3.0 0.0 (demotable)
User feedback 1.5 0.4 (demotable)
Evaluator confidence 0.5 0.8 (weakest; can't demote alone)

Contribution is normalized so a single max-magnitude tick never drops the score below 0.7 — sustained signals are what drive demotion. Full per-signal decomposition (counts + magnitude totals) is preserved on every score so the Curator's audit records can explain exactly why a score moved.

  • cascade.py — three paths when an artifact's score updates:
  • Drift escalation (acute): ≥3 L3 signals within a 10-minute window → DriftEvent recommendation. Admin review required.
  • L3-dominant fast-demote (N=2 cycles): when consecutive drops are L3-driven → recommend_fast_demote.
  • Normal demote (N=5 cycles below threshold): score stayed sub-0.5 long enough → recommend_demote. Every non-none branch flags the artifact's upstream IntentPattern / PromptShim / SpecFragment for re-synthesis on the next trigger.

  • attribution.py — trace-id → applied-artifact-ids registry. Oracle's attacher records every attachment at dispatch time (via a new optional trace_id parameter on for_l1 / for_l3_agent); the evaluator queries by trace_id when signals arrive. Entries age out after APOLLO_GRAPH_TRACE_STATE_TTL_SEC.

  • recommendations.py — pending-demotion queue with replace-semantics (the latest recommendation for an artifact overrides the prior one). Admins see the queue via a new GET /recommendations endpoint; calling the M9 demote endpoint with the evaluator's score fields automatically clears the queue entry.

The demote audit record now carries evaluator_score, score_decomposition, and upstream_artifact_ids. The M9 schema already had these fields — M10 populates them when admins act on an evaluator recommendation. An admin auditing Apollo's Curator history can see for any demotion: what the rolling score was, how many signals of each kind had fired, and which upstream artifacts were flagged for re-synthesis.

Autonomous mode stays off. The Evaluator is an advisor at M10 — it writes recommendations, it doesn't demote on its own. M12 flips the autonomous switch.

What it proved. A synthetic run of 3 consecutive tool_error observations attributed to the same artifact produces an evaluator recommendation within 2–3 ticks. Signal weights are tunable without code changes (env-driven). Library-emitted observations correctly skip the schema-mismatch check. Bursts escalate to DriftEvent; slow drifts take the normal-demote path. Upstream refs are flagged on every recommendation so the next synthesis pass can re-examine the generators, not just the failing leaf.

Why it matters. The feedback loop closes here. Before M10 artifacts only moved forward. With the Evaluator in place, Apollo can tell an admin "pshim_xyz stopped working — 3 of the last 5 traces it guided failed; recommend demote" and that recommendation carries enough score decomposition for the admin to trust or reject it without re-deriving the math.

Milestone 11 — Admin chat empowerment

What shipped. Admins can now talk to Apollo in natural language. The admin-chat endpoint (POST /api/v1/apollo/chat with action: "chat") drives Apollo's LLM through a tool-use loop: the admin types a message, the LLM decides which tool to call, reads the result, decides whether to call another tool or compose the final prose answer. The loop is bounded by _MAX_CHAT_ITERATIONS (6) so a misbehaving LLM can't thrash.

Full tool catalog — 15 tools. Every action admins could previously take via the REST endpoints is now available as a chat tool, plus two purely conversational tools:

Tool Kind Purpose
list_memories / get_memory / list_decisions read Inspect observations + audit records
explain_decision read "Why did you demote pshim_xyz?" — pulls audit + resolves evidence
discuss_decision read Pre-loads full context (audit + current artifact + upstream flags) for a focused thread
promote_artifact / demote_artifact / rollback_artifact / forget_artifact / edit_artifact mutate Curator actions via chat
forget_memory mutate Delete an observation
rollback_graph mutate Restore a prior graph snapshot
trigger_synthesis mutate Admin-initiated synthesis pass
pause_curator / resume_curator mutate Emergency freeze of all mutations

Every chat-initiated mutation writes an audit record with actor: "admin:<username>" and trigger: "admin_chat" so the full audit log shows whether the action came from the REST endpoints, the chat LLM loop, or (eventually M12) the autonomous curator.

Curator pause/resume. A new process-wide flag in apollo/curator/pause.py. When flipped on, every Curator mutation raises CuratorPaused at the top of its call — no history write, no audit record, no state change. Pause and resume themselves write indefinite audit records so the full pause history outlasts retention. The admin-SSE debug feed fans out curator_paused / curator_resumed events on the flip so live operators see the state change immediately.

Rollback semantics for paused state: even resume_curator is gated on admin role (not on pause itself) — the pause can only be lifted by admin action, never by any autonomous path. This is the emergency-off-switch contract.

Conversational explanation flow. explain_decision and discuss_decision are the substrate for the admin's "why did you do this?" surface. Given an audit_id / artifact_id / trace_id, the helpers load the audit record, the artifact's current state, any upstream artifacts the evaluator flagged, and the observations on referenced traces — all bundled into a single tool_result the LLM folds into its prose answer.

What it proved. A real conversational flow works end-to-end: admin types "why did you demote pshim_xyz?" → LLM calls explain_decision(artifact_id="pshim_xyz") → reads the audit record → composes a response citing the concrete reason. Mutation tools work the same way. When the curator is paused, mutation tools surface the pause as a structured tool result so the LLM explains the freeze rather than retrying blindly.

Why it matters. This is the moment Apollo becomes operationally conversational. After M11, an admin can have a conversation — ask about recent curator activity, drill into why a specific artifact was demoted, roll back a bad decision, pause the whole curator during an investigation — all through one chat surface. The tool-use loop + policy gate + audit trail make this safe.

Milestone 12 — Autonomous Curator + drift prevention tuning

What shipped. Apollo's Curator can now commit without admin intervention. A periodic sweep (apollo/curator/auto.py, default 30s cadence) reads both the synthesis pending list and the evaluator recommendation queue, distinguishes evolution-class work (safe to auto-commit) from drift-class work (still needs admin review), and acts on the evolution-class cases with actor="curator_auto" and trigger="autonomous_curator".

The split:

Source Evolution-class → auto Drift-class → admin-only
Synthesis pending list status: "approved" proposals (drift-check passed) status: "drift_flagged" proposals
Evaluator recommendation queue kind: "demote" / "fast_demote" kind: "drift_event" (acute bursts)

Deterministic artifact ids. When the autonomous promoter commits a new proposal, it derives the artifact_id from the proposal body: fp_<hash(service+tool+signature)> for FailurePattern, ip_<hash(intent_class+prompt_shape)> for IntentPattern, etc. The same logical pattern converges on the same id across repeated synthesis passes — versions accumulate on one artifact instead of proliferating into many.

The same guards. Every auto-commit flows through the M9 Curator atomic sequence (policy gate → history → store → audit → SSE broadcast). The pause flag gates autonomous commits the same way it gates admin-triggered ones. Autonomous mode flips via APOLLO_CURATOR_AUTONOMOUS=true (default false so prior milestones' behavior is preserved on upgrade).

The safety seam. Drift-class work (the acute L3 bursts and the drift-check flags from M8) is deliberately retained for admin review. An autonomous Curator that also committed drift-class proposals would be fighting the drift-check's entire purpose.

What it proved. A synthesis pending list with 10 approved proposals becomes 10 committed artifacts in one sweep, each with actor: "curator_auto". Mixed queues auto-commit only the approved ones. Pausing the curator mid-sweep stops it cleanly. The same proposal body always resolves to the same artifact_id.

Why it matters. This is the inflection point where Apollo stops needing admin attention to get its basic job done. With M12, the admin's role shifts from approver of every mutation to reviewer of drift cases + auditor after the fact.

Milestone 13 — Maintenance + /stats polish + degraded-emitter reporting

What shipped. The ops-readiness milestone. Three pieces:

1. Hourly maintenance loop (apollo/maintenance.py). Runs on APOLLO_MAINTENANCE_INTERVAL (default 1h). On each pass: - Scans apollo_observations, apollo_audit, and apollo_graph_snapshots and runs delete_by_query on every doc whose expires_ts < now(). - Indefinite audit records (forget, rollback, pause_curator, resume_curator, graph_rollback) carry expires_ts: null and are never touched. - Metrics emitted on every run: apollo_maintenance_last_run_ts + apollo_maintenance_docs_deleted_total{index}. - Snapshot coarsening (hourly → daily → weekly tiers per SPEC-14 Q5) is scaffolded; M13 ships the hook without the tier-generation logic (deferred until production accumulates enough data to warrant it). - Snapshot coarsening (hourly → daily → weekly tiers per Q5) is scaffolded; M13 ships the hook without the tier-generation logic.

2. /stats surface expansion. Two new top-level keys: - degraded_emitters — services whose apollo_ingest_last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300). - intent_schema_coverage — rolling percentage of traces in the last 24 hours that carried an intent_schema observation. Null when there's no data in the window. Substrate for the eventual flip to APOLLO_REQUIRE_INTENT_SCHEMA=true in Phase 3.

3. Extended audit filters. GET /api/v1/apollo/audit supports two new query params: - trigger — term filter on the mutation trigger (admin_endpoint, admin_chat, autonomous_curator, etc.). - artifact_type — prefix filter on artifact_id (fp_, ip_, ps_, sf_).

What it proved. Expired observations and routine audit records purge cleanly; indefinite records survive the sweep. The /stats surface flags stale services without alerting on services that simply haven't been observed yet. The audit endpoint's filters compose (AND semantics).

Why it matters. An always-on system that grows state forever isn't operable. M13 is the janitor that lets Apollo run indefinitely. It's also the final piece for Phase 3 readiness — once intent_schema_coverage is high enough in production, the APOLLO_REQUIRE_INTENT_SCHEMA and APOLLO_REQUIRE_TRACEPARENT flags can flip to true without introducing data loss.

Milestone 14 — Subscriber LLM consumption (cortex L3)

The gap M14 closes. Through M0–M13, oracle attached apollo_guidance to every /chat response (L1) and every outbound MCP dispatch bound for an agent-kind L3 service. The wire was carrying guidance, but no subscriber actually read it. Cortex's MCP tool signatures didn't declare an apollo_guidance parameter, so FastMCP silently stripped the field before the handler ran. The brain was thinking; the body wasn't listening.

What shipped. M14 wires the consumption side into cortex (L3) per the contract locked in Q20:

  • Cortex (L3): request-scoped ApolloGuidanceCache populated at the top of _handle_tools_callapollo_guidance is popped from the inbound arguments dict (so FastMCP no longer strips it) and the cache is exposed to tool implementations via a ContextVar. Tools that internally run an LLM read the cache's accessors before composing their per-call system prompt and tool catalog.
  • SDK distribution. Cortex imports ApolloGuidanceCache directly from axonis.core.oracle.guidance_cache — the canonical source per Q15. Cortex's pyproject.toml declares axonis-core>=0.1.0, mirroring oracle's pattern. (An earlier branch of M14 vendored the module locally to keep the agent lightweight, but cortex already imported axonis.core.llm.LLMSpec for its narrative tool, so the vendoring decision didn't pay for itself; the import was unified on the canonical path. Q15 still allows vendoring for future subscribers whose dep posture differs from cortex's.)
  • Failure posture (Q20): cache.update(None) is a legal no-op; missing/malformed guidance never blocks the tool. Without guidance, accessors return empty lists / None and the tool's prompt builder behaves exactly as it did pre-M14.

Why beacon (L1) is deferred. Beacon currently has no HTTP connection to oracle — its MCP_SERVER_URL defaults to http://localhost:8000/mcp (cortex direct), so attached apollo_guidance has no path into beacon's process today. Once the beacon↔oracle connection lands, beacon's wiring follows the same SDK pattern as cortex.

Why parallax is deferred. Parallax's wiring follows cortex's pattern verbatim (MCP handler argument pop + request-scoped cache contextvar), but is deferred until parallax's own Phase 1 onboarding lands.

What it proved. Integration tests now assert the system prompt sent to a downstream LLM call inside a cortex tool observably grows when guidance is present and is unchanged when absent — the assertion the M0–M13 tests stopped short of.

Why it matters. Without M14, every artifact Apollo synthesized through M8–M12 was attached to envelopes and discarded by the recipients. M14 is the difference between a system that records and reasons and a system that actually steers behavior on the L3 side.

The journey, summarized

Apollo started as a design spec in M0 and ended as a live observation/learning/guidance system whose advice is read by L1 and L3 LLMs at runtime in M14. Fourteen milestones, each a reversible slice that left oracle functional whether or not the milestone shipped. No rewrites. No regressions.

Phase Milestones What it delivered
Phase 1 — Observe + ground M0–M6 Ingest, graphs, guidance attach, subscriber SDK, oracle-sole-observer, trace propagation
Phase 2 — Synthesize + advise M7–M10, M14 Admin inspection, LLM synthesis + drift check, Curator commits, Evaluator scoring; M14 retroactively closes Phase 2's Injection Channel commitment on the L3 side by wiring cortex consumption (the L1 side is deferred)
Phase 3 — Empower + maintain M11–M13 Admin chat empowerment, autonomous Curator, hourly maintenance

Cumulative capability matrix (as of M14 — final)

Capability Status
Users can call /chat; responses carry apollo_guidance field ✅ M3
Every /chat turn produces a full observation lineage under one trace_id ✅ M5 + M6
Cortex (L3) consumes attached guidance via ApolloGuidanceCache ✅ M14 (SDK M4; oracle attach M5; consumption wiring M14)
Beacon (L1) consumes attached guidance via ApolloGuidanceCache ⏸ Deferred — gated on a beacon↔oracle connection (no path today)
L3 libraries correctly skip guidance injection ✅ M3
Observations land in apollo_observations with dedup ✅ M1
Deterministic graph updates on every observation ✅ M2
Hourly graph snapshots ✅ M2
Admin can inspect observations, graph state, and stats ✅ M7
Admin SSE debug feed is live (Curator commits fan out per M9; pause/resume per M11; autonomous commits per M12) ✅ M7 (channel) + M9 (commits)
Apollo's LLM fires on triggering events, proposes artifacts ✅ M8
Proposals gated by four-check graph-anchor drift ✅ M8
Admin can trigger synthesis manually via POST /learn ✅ M8
Trace propagation through L1 → Oracle → L3 ✅ M6
Non-admin callers blocked from every admin endpoint ✅ M7
APOLLO_EMITTER_ENABLED=false kills emission cleanly ✅ M5
Local MiniMax via HuggingFace scaffolded ✅ M8
Admin can promote approved proposals into active artifacts ✅ M9
Every Curator mutation is versioned (apollo_artifact_history) and audited (apollo_audit) ✅ M9
Curator policy gate blocks six hard invariants ✅ M9
Admin can edit / demote / forget / rollback artifacts ✅ M9
forget and rollback write indefinite: true audit records ✅ M9
Curator commits fan out to admin SSE debug feed ✅ M9
Per-artifact rolling score driven by four failure signals ✅ M10
L3 signals carry amplified weight; fast-demote after N=2 L3-dominant cycles ✅ M10
Acute L3 bursts escalate to DriftEvent rather than silent demotion ✅ M10
Upstream artifacts flagged for re-synthesis on every recommendation ✅ M10
Admins see the recommendation queue on GET /recommendations ✅ M10
Demote audit records carry evaluator_score + full per-signal decomposition ✅ M10
Admin-chat LLM loop — natural language drives tool selection ✅ M11
explain_decision / discuss_decision surface audit + evidence conversationally ✅ M11
Chat tools cover every mutation ✅ M11
pause_curator / resume_curator — emergency off-switch with indefinite audit ✅ M11
Chat-initiated mutations audited with actor: "admin:<username>" + trigger: "admin_chat" ✅ M11
Autonomous Curator auto-commits evolution-class proposals + recommendations ✅ M12
Drift-class work retained for admin review ✅ M12
Deterministic artifact_id derivation ✅ M12
Auto-commits gated by APOLLO_CURATOR_AUTONOMOUS flag + pause state ✅ M12
Hourly maintenance job — expired-doc purge via delete_by_query ✅ M13
Indefinite audit records never purged ✅ M13
/stats surfaces degraded_emitters + intent_schema_coverage ✅ M13
GET /audit supports trigger + artifact_type filters ✅ M13

Test totals (live counts; refresh on each milestone boundary): see the captured runs under oracle/docs/proof/consumption/l3-cortex-in-process/. At time of M14 landing: 416 oracle + 18 cortex M14 + 16 oracle graph + 138 oracle learning-loop tests passing; cortex full suite at 3026 / 0 / 307 (excluding two pre-existing orphan files). Specific repo counts shift as suites grow.

What a live /chat request looks like today

Note on the L1 caller. "L1 caller" means whatever POSTs to oracle's POST /api/v1/chatcurl, an integration test, or a direct API client today; beacon once a beacon↔oracle connection lands. /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop. It is distinct from /api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for admin chat with Apollo.

Tracing one request from any L1 caller to response:

  1. L1 → Oracle. The L1 caller sends POST /chat with a user message and a traceparent header.
  2. TraceparentMiddleware (M6) parses the header, installs the trace_id on a ContextVar for the lifetime of the request.
  3. Handler. /chat extracts the prompt, calls emit_user_prompt(...) which drops an envelope on Apollo's in-process queue. An llm_turn observation fires on each LLM cycle inside the tool-use loop.
  4. Tool call. LLM decides to call a cortex tool. Oracle looks up the tool; cortex is component_kind="agent", so attacher.for_l3_agent(...) is called to produce the current apollo_guidance payload (M3). The payload is injected into the MCP arguments alongside llm_spec. traceparent is forwarded on the outbound HTTP.
  5. L3. Cortex's MCP handler pops apollo_guidance from arguments, populates a request-scoped ApolloGuidanceCache via the contextvar (M14), runs the tool, and responds over MCP.
  6. Oracle observes. The MCP response returns. Oracle emits tool_output (or tool_error) in-process (M5) under the same trace_id.
  7. LLM loop continues until the model returns text.
  8. emit_final_response fires. The response body is assembled with apollo_guidance attached (M3). L1 receives it.
  9. In the background. Apollo's ingest workers persist each observation. Each passes through the deterministic extractors and updates the five Decision Graphs (M2). For qualifying event types, the synthesis engine schedules a coalesced LLM pass (M8); any proposal goes through the four-check drift gate and lands on the pending list. M12's autonomous-curator sweep promotes evolution-class proposals on its own; drift-class work waits for admin review.

Everything that happens between button-click and response is on the hot path. Everything that updates graphs, triggers synthesis, or writes audit records happens on background tasks that can't stall the request.

Roadmap (future work beyond M14)

Follow-up work tracked separately (full operational detail in §Future Improvements):

  • Beacon ↔ Oracle connection. M14 deferred the L1 subscriber wiring because beacon's MCP_SERVER_URL defaults to cortex direct. Tracked in §Future Improvements §2.3. Once the connection design lands, beacon's L1 consumption follows the cortex SDK pattern.
  • Parallax onboarding. Same wiring pattern as cortex's M14 when parallax's Phase 1 work lands.
  • Additional L3 emitter onboarding. UDS, athena, testament, titan, rest/fedai-rest — each onboards via either in-process relay or direct ApolloClient POST. No Apollo code change required.
  • Production-grade minimax-local LLM provider. The scaffold landed in M8; deferred work: APOLLO_LLM_LOCAL_MODEL_PATH, thread-pool offload, device/quantization knobs, pre-pull readiness gate, streaming tokens.
  • Required-mode flips (APOLLO_REQUIRE_INTENT_SCHEMA, APOLLO_REQUIRE_TRACEPARENT). Once M13's coverage stat is steadily high, operators can flip these to true.
  • Snapshot tier generation. M13 shipped the coarsening hook; the actual hourly→daily→weekly snapshot generation is deferred until production data volume warrants it.
  • Federation of artifacts. axonis-core's UDS pattern supports federation; Apollo's Curator will use it to share high-confidence artifacts across federated deployments in a later phase.

Post-M15: Prioritization Layers

After M14 closed the consumption loop and M15 wired oracle's own chat LLM as the L2 subscriber, the next pressure surfaced empirically: with the active artifact set growing, the attacher had no way to prefer the better artifacts. A multi-artifact stress test showed three problems on a single attach: silent drops at the cap, recency-eviction beating real quality, and zero observability for what was held back.

The response was a seven-layer rebuild of the selection path:

  1. Capped lineage — every dropped artifact gets a kind: "capped" row; admins can ask "this shim matched 47 times but never sent."
  2. Smarter sort key — five-tier priority (evaluator quality → synthesis confidence → applicability specificity → weight → recency) replaces the old "default weight 1.0 → recency wins."
  3. Signal preservation at promote — a contract test pins that evaluator_score, confidence, and weight survive the metadata strip.
  4. Real signals flowing — 4-A wires the evaluator to write scores back to the artifact; 4-B teaches synthesis to emit a confidence per proposal.
  5. Deepened rationalerationale_summary now names artifact IDs (attached + capped), and a new per-artifact aggregation query answers "how often is this artifact being shadowed?"
  6. Similarity — embeddings at promote (6-A) drive a promote-time advisory (6-B) and a periodic curator-time merger sweep (6-C).

Each layer is independently disabled by an env flag and degrades cleanly when its prerequisites aren't present. The full contract is in SPEC-PLATFORM-14-APOLLO.md §Prioritization Layers; the historical changelog of what shipped is in docs/APOLLO-FUTURE-IMPROVEMENTS.md §12.

The remaining work is the cap-defaults empirical study — once production telemetry has accumulated, revisit whether the per-type caps and similarity thresholds need tuning. That's a data-collection task, not code.

End-to-End Scenario

A presentation-ready walkthrough of a live request flowing L1 caller → Oracle/Apollo → Cortex → Oracle/Apollo → L1 caller. This section accompanies the automated integration test at oracle/apollo/tests/test_integration_beacon_to_l3.py and explains what the test proves about the real production flow.

Note on the L1 caller. The hop diagram below names "beacon" as the L1 caller, but beacon does not currently call oracle. The L1 hop today is exercised only by curl or other direct callers against POST /api/v1/chat — oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop. POST /api/v1/apollo/chat is a separate admin-scoped surface that runs Apollo's independent MiniMax LLM and is not the L1 path. The flow below describes the /api/v1/chat path; beacon onboards once a beacon↔oracle connection is wired.

Audience. Engineers demoing the system, reviewers verifying the architecture holds, and anyone who wants to see what Apollo actually does on one request.

The hop sequence

beacon                   oracle + apollo                    cortex
  │                            │                              │
  │ POST /api/v1/chat          │                              │
  │ Authorization: Bearer …    │                              │
  │ traceparent: 00-<tid>-…    │                              │
  │ body: {message, convo}     │                              │
  ├───────────────────────────▶│                              │
  │                            │                              │
  │      (1) TraceparentMiddleware                            │
  │          validates / mints, sets ContextVar               │
  │                            │                              │
  │      (2) /chat handler:                                   │
  │          oracle.hooks.chat.emit_user_prompt(…)            │
  │          → apollo_observations  [L1-origin]               │
  │                            │                              │
  │      (3) ToolExecutor.run(…)  — LLM tool-use loop         │
  │          • llm.router.complete(…)                         │
  │          • oracle.hooks.chat.emit_llm_turn(…)             │
  │            → apollo_observations  [L2-origin]             │
  │                            │                              │
  │      (4) _call_backend_tool:                              │
  │          • attacher.for_l3_agent(…) → apollo_guidance     │
  │          • tool_args["apollo_guidance"] = {…}             │
  │          • outbound httpx POST to cortex /agentspace/mcp  │
  │            + traceparent header forwarded                 │
  │                            ├─────────────────────────────▶│
  │                            │                              │
  │                            │     cortex extracts          │
  │                            │     apollo_guidance,         │
  │                            │     feeds its local          │
  │                            │     ApolloGuidanceCache,     │
  │                            │     runs the tool,           │
  │                            │     responds                 │
  │                            │◀─────────────────────────────┤
  │      (5) After dispatch returns:                          │
  │          oracle.hooks.chat.emit_tool_output(…)            │
  │          → apollo_observations  [L3-origin,               │
  │             emitted by oracle on cortex's behalf]         │
  │                            │                              │
  │      (6) oracle.hooks.chat.emit_final_response(…)         │
  │          → apollo_observations  [L2-origin]               │
  │                            │                              │
  │      (7) attacher.for_l1(…) → apollo_guidance block       │
  │                            │                              │
  │ ◀──────────────────────────┤                              │
  │ 200 OK                     │                              │
  │ { response, tool_calls,    │                              │
  │   apollo_guidance: {…} }   │                              │
  │                            │                              │
  │ beacon's local             │                              │
  │ ApolloGuidanceCache        │                              │
  │   .update(apollo_guidance) │                              │

Everything inside the oracle/apollo box runs in one process on one event loop. The only network hops are L1 → Oracle and Oracle → L3 (and the return legs). Apollo itself is never on the wire.

The integration test — what each scenario proves

oracle/tests/test_integration_beacon_to_l3.py exercises this flow end-to-end in-process. A fake cortex ASGI app stands in for the real service; httpx is routed through it via httpx.ASGITransport. All of oracle's real code runs — auth dependency (stubbed payload), TraceparentMiddleware, /chat handler, ToolExecutor loop, Apollo emission helpers, attacher, guidance cache SDK. No mocks on the Apollo surface itself.

Scenario 1 — happy path (TestHappyPath)

Beacon sends a well-formed request; cortex answers successfully. The test asserts:

  1. 200 OK with response text and tool_calls array populated.
  2. Cortex received apollo_guidance in its MCP arguments dict. Oracle's M3 attacher ran at dispatch time; the payload has the AttachedGuidance shape (as_of + artifacts + rationale_summary).
  3. Cortex received traceparent in HTTP headers. M6 propagation intact.
  4. Response body carries apollo_guidance — beacon's local ApolloGuidanceCache.update(body.apollo_guidance) succeeds.
  5. Observation lineage stitches under one trace_iduser_prompt, llm_turn, tool_output, final_response all land in apollo_observations with the same W3C trace-id the caller sent.
  6. tool_output envelope's arguments has apollo_guidance stripped. Apollo doesn't observe its own injection as part of the caller's intent — the strip happens in emit_tool_output before enqueueing.

Scenario 2 — L3 failure (TestL3FailurePath)

Same request shape, but the fake cortex returns an error. The test asserts:

  • 200 OK from oracle (the request itself didn't fail; the tool did).
  • Observations include tool_error instead of tool_output — oracle's M5 detect-and-emit logic correctly identified the JSON error envelope and routed to the error helper.
  • Trace stitching still holds on the failure path — same trace_id across the error observation.

Scenario 3 — missing traceparent (TestMissingTraceparent)

Beacon forgets to send the header. The test asserts:

  • 200 OK (best-effort mode).
  • TraceparentMiddleware minted a fresh 32-hex trace-id.
  • Every observation in the run carries the same minted trace-id — lineage stitches even when the caller didn't mint one.

(The apollo_missing_traceparent_total counter increments; the test doesn't assert on counters specifically, but the production flow exercises that path.)

Scenario 4 — subscriber SDK shape contract (TestCortexGuidanceConsumption)

The test takes the apollo_guidance payload cortex received and feeds it into axonis.core.oracle.guidance_cache.ApolloGuidanceCache.update(…) directly. Asserts the payload has the expected {as_of, artifacts, rationale_summary} shape and the SDK accepts it without error — proving the on-wire contract matches the consumer SDK even in the empty-artifact-set case (no real artifacts promoted).

What the test doesn't cover (scoped differences from live)

The integration test is authoritative for oracle's request path + Apollo integration under real code, but it does NOT exercise:

Not covered Why How to cover
Real JWT / Keycloak signature verification Requires live SSO Live deployment smoke test
Real Elasticsearch write/read Adds flakiness + CI cost Live deployment with seeded index templates
Real Redis conversation persistence Same Live deployment
Real LLM output quality server.llm.router.complete is stubbed Deploy with an LLM key and run through beacon's chat UI
Real network latency / TLS / cross-host traceparent All in-process Multi-host staging scenario
Cortex's real tool implementations Fake cortex only returns canned responses Run real cortex with domain packs loaded
Long-running ingest worker behavior The test bypasses the queue (writes directly) Live ES-backed deployment

For a live demo, the companion script below mirrors the test's assertions against real running services.

Live scenario script (running services)

Assumes:

  • Oracle running at localhost:8001 (per developers-environment/oracle/oracle.env).
  • Cortex running at localhost:8000 (or wherever ORACLE_SERVICES points at registration).
  • Parallax optional but recommended for variety.
  • Elasticsearch + Redis up and reachable from oracle.
  • A valid Keycloak user token exported as $USER_TOKEN, and an admin token as $ADMIN_TOKEN.
  • Every APOLLO_* variable comes from developers-environment/conf/development.axonis.ai.env — the canonical home for Apollo config, shared across oracle, parallax, cortex, and beacon. The env file mirrors every variable Apollo's settings.py reads, grouped by subsystem (LLM, ingest, graphs, evaluator, curator, audit, maintenance, trace propagation, drift detection). For a live-LLM demo, override APOLLO_LLM_PROVIDER=openai + APOLLO_LLM_BASE_URL + APOLLO_LLM_API_KEY (see SPEC-PLATFORM-14-APOLLO.md §Apollo's LLM). For a stub-LLM plumbing-only run, override APOLLO_LLM_PROVIDER=stub.

Step 1 — confirm services are up

curl -sf http://localhost:8001/health | jq
curl -sf http://localhost:8000/health | jq                 # cortex
curl -sf http://localhost:8001/service-info | jq           # oracle's own info
curl -sf -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8001/api/v1/apollo/stats | jq '.milestone, .metrics | keys[]' | head -20

Expected: oracle returns "status": "ok", cortex returns healthy, /stats returns the current milestone (M14 at time of writing) plus every Apollo counter at zero.

Step 2 — send the beacon request

TRACE_ID=$(openssl rand -hex 16)
SPAN_ID=$(openssl rand -hex 8)
TP="00-${TRACE_ID}-${SPAN_ID}-01"

curl -sS -X POST http://localhost:8001/api/v1/chat \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "traceparent: $TP" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "find recent activity for customer cust_42",
    "conversation_id": "demo_1"
  }' | jq '. | {response, tool_calls, apollo_guidance}'

Expected: a JSON response with response, tool_calls (non-empty when the tool-use loop fired), and apollo_guidance populated with {as_of, artifacts, rationale_summary}. Pre-promote, artifacts is [].

Step 3 — tail the admin SSE feed (in another terminal)

curl -N -H "Authorization: Bearer $ADMIN_TOKEN" \
  "http://localhost:8001/api/v1/apollo/guidance/stream?scope=*"

Nothing prints yet (no Curator commits). Leave this open for Step 6.

Step 4 — verify the lineage landed

# Every observation oracle emitted for this request.
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  "http://localhost:8001/api/v1/apollo/memories?trace_id=${TRACE_ID}&limit=50" \
  | jq '[.observations[] | {event_type, service, payload: (.payload | keys)}]'

Expected: array includes user_prompt, one-or-more llm_turn, tool_output (or tool_error), final_response — all with the same trace_id. tool_output.service is the L3 service oracle dispatched to (e.g., "cortex").

Step 5 — check synthesis triggered + drift-checked a proposal

curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8001/api/v1/apollo/artifacts | jq '.count'

Expected: at least one pending proposal (if the message hit a failure-pattern or intent-pattern synthesis path). Each proposal has status: "approved" or status: "drift_flagged" with per-check drift detail.

Step 6 — admin promotes the proposal (M9 + M11)

Natural-language path:

curl -sS -X POST http://localhost:8001/api/v1/apollo/chat \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "chat",
    "message": "promote the approved FailurePattern proposal you just made"
  }' | jq '. | {response, tool_calls}'

Oracle's admin-chat LLM picks promote_artifact, writes the audit record, and narrates the result. In the other terminal, the SSE feed prints:

data: {"event": "curator_commit", "action": "promote", "artifact_id": "fp_<hash>", "actor": "admin:<you>", "ts": "..."}

Direct REST path:

PROP_ID=$(curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  http://localhost:8001/api/v1/apollo/artifacts | jq -r '.pending[0].id')

curl -sS -X POST http://localhost:8001/api/v1/apollo/artifacts/fp_live_demo/promote \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"proposal_id\": \"${PROP_ID}\", \"rationale\": \"demo promote\"}" | jq

Step 7 — re-run the same /chat to see guidance flow through

Re-send Step 2's request. This time:

  • apollo_guidance.artifacts contains the promoted FailurePattern.
  • Cortex's inbound MCP dispatch carries the artifact in arguments.apollo_guidance.artifacts.
  • Cortex's MCP handler (M14) pops the field, populates a request-scoped ApolloGuidanceCache via the contextvar; the cortex tool's next LLM call folds get_active_failure_patterns(...) into its system prompt. Captured proof of this end-to-end: oracle/docs/archive/M14-CORTEX-CONSUMPTION-PROOF.md + oracle/docs/proof/.

Step 8 — audit the trail

curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
  "http://localhost:8001/api/v1/apollo/audit?artifact_id=fp_live_demo" \
  | jq '[.records[] | {action, actor, trigger, rationale, indefinite}]'

Expected: one promote record with actor: "admin:<you>", trigger: "admin_chat" (or admin_endpoint), non-empty rationale.

What a presenter walks through

Suggested demo flow (15 minutes):

  1. Open Step 1 and Step 2 — show oracle healthy, send one /chat, surface the response with the apollo_guidance field.
  2. Open Step 4 — show /memories?trace_id=... returning the full lineage under one trace_id. This is Apollo observing.
  3. Open Step 5 — show /artifacts returning one or more pending proposals. This is Apollo synthesizing.
  4. Open Step 6 with the chat path — admin says "promote it"; the LLM calls promote_artifact; the SSE feed prints curator_commit live. This is Apollo advising + admin acting.
  5. Open Step 7 — re-run the /chat; apollo_guidance now has the promoted artifact; cortex sees it in its tool arguments. This is Apollo's learning reaching L3 via oracle.
  6. Open Step 8 — audit record shows actor, rationale, and indefinite flag semantics. This is Apollo's trail.

Optional: run the in-process integration test as a "here's how we prove this in CI" companion:

. .venv/bin/activate
pytest oracle/tests/test_integration_beacon_to_l3.py -v -s

4 scenarios, all deterministic, all green.

Cross-references

Cross-references

  • Contract: §Ingest Semantics, §Injection Channel, §Trace Propagation
  • Build order: §Implementation Plan M5 (oracle-sole-observer), M6 (traceparent), M7 (admin surface), M9 (curator)
  • Narrative: §Design Journey "What a live /chat request looks like today"
  • Runtime call graph: §Technical Overview "The /chat call graph"

File & Function Inventory

Map of every file that exists for Apollo (Apollo's own apollo/ tree plus Apollo-specific files in oracle's server tree). Each file is listed under its package; each entry has a one-line file description, then a bullet list of its public functions/classes with a 1-2 sentence description per item.

apollo/ package root

apollo/__init__.py — Package marker; module docstring states Apollo is the observation/learning/guidance layer mounted into oracle at /api/v1/apollo/*. No re-exports.

apollo/app.py — Builds Apollo's FastAPI sub-app and owns the five background loops (snapshot, autonomous curator, maintenance, synthesis sweep, coalescer) that oracle's Starlette lifespan starts and stops. - startup() — async; called from oracle's lifespan. Bootstraps indices, wires the active-artifact source, prewarms the active-set cache, starts the ingest queue/workers, and spawns the five periodic tasks (including the Layer 6-C coalescer loop). - shutdown() — async; signals each loop's stop event (snapshot, curator_auto, maintenance, synthesis_sweep, coalescer), awaits with a bounded timeout, drains the admin-SSE hub, then stops the ingest workers.

apollo/artifacts.py — Pydantic schemas for every artifact type listed in §Memory Model → Artifact types, plus the AttachedGuidance envelope that rides on outbound responses/dispatches. Defines which types are attachable vs admin-only. - ArtifactType(Enum) — full set of typed artifacts Apollo may produce; adding a new type requires extending this enum and adding a *Content class. - ATTACHABLE_TYPES — frozenset of the six artifact types the SDK's canonical accessors consume; others are admin/audit-only. - Applicability(BaseModel) — scope filter (intent_class, layer, service_name, tool_name, tags) the selector evaluates per artifact. - PromptShimContent, SpecFragmentContent, ToolPairingHintContent, FailurePatternContent, ServiceConnectionHintContent, IntentPatternContent — content models for the six attachable types. - DriftEventContent, DecisionTrajectoryContent, IntentSchemaContent, SchemaDriftContent, PromptShapeContent, CapabilityMapContent — admin/audit-only content models declared for forward compatibility. - validate_content(artifact_type, content) — coerce a raw content dict against its type's schema; the Curator calls this at commit time. - ApolloArtifact(BaseModel) — outer envelope per artifact (id, type, version, applicability, content, rationale, as_of). - AttachedGuidance(BaseModel) — the apollo_guidance wire payload (as_of, artifacts, rationale_summary); intentionally slim per §Injection Channel → Payload shape.

apollo/llm.py — Apollo's own LLM client (M8). Pluggable via APOLLO_LLM_PROVIDER across openai/minimax/anthropic/minimax-local/stub providers; supports both blocking complete() and token-streaming stream(). - ToolCall — dataclass; normalized provider tool-call (id, name, arguments dict). - StreamChunk — dataclass; one delta from stream(), carrying either content_delta or terminal final LLMResponse. - LLMResponse — dataclass; flat provider-agnostic response with as_json() tolerant parser. - LLMClient — front-door singleton with get(), reset_singleton(), install_stub_response(), install_stub_stream(), and provider-dispatching complete() / stream() calls.

apollo/maintenance.py — Hourly background loop (M13) that purges expired docs via delete_by_query, coarsens hourly→daily→weekly snapshots, reconciles orphaned artifact-history rows, and emits maintenance metrics. Also exposes read-side helpers for /stats. - run_periodic(stop_event) — async loop; sleeps in short bounded waits so shutdown is prompt. - run_once(now=...) — one maintenance pass; injectable now for tests, returns a summary dict. - degraded_emitters() — scan per-service last-ingest timestamps and flag any stale beyond APOLLO_INGEST_STALE_WARN_SEC. - intent_schema_coverage(window_hours) — percentage of recent traces carrying an intent_schema observation; None when no data in window.

apollo/metrics.py — Prometheus counter/gauge declarations for every Apollo metric named in the spec. Registered at import time so dashboards can be wired before drivers exist. - Module-level metrics for ingest (INGEST_ACCEPTED_TOTAL, INGEST_QUEUE_DEPTH, etc.), guidance attach, traceparent propagation, synthesis sweep, maintenance loop, and curator mutations/policy violations/atomic failures/orphan detection. - Guidance-attach counters/histograms: GUIDANCE_ATTACH_NULL_TOTAL (scope, reason), GUIDANCE_ATTACH_SUCCESS_TOTAL (scope), GUIDANCE_ATTACH_PAYLOAD_BYTES (scope), GUIDANCE_ATTACH_ARTIFACT_COUNT (scope), GUIDANCE_ATTACH_CAPPED_TOTAL (scope, artifact_type). - Evaluator score-persist counters (Layer 4-A): EVALUATOR_SCORE_PERSISTED_TOTAL, EVALUATOR_SCORE_PERSIST_FAILED_TOTAL. - Coalescer counters (Layer 6-C): COALESCER_PROPOSALS_EMITTED_TOTAL, COALESCER_MERGE_FAILED_TOTAL. - snapshot() — read every counter/gauge into a dict shape used by GET /stats; skips synthetic _created samples.

apollo/admin/

apollo/admin/__init__.py — Package marker (M7); re-exports admin_router and SSEHub.

apollo/admin/api.py — Admin-only REST endpoints (M7→M13). Every route gated on atlasfl-admin via require_admin; covers memory CRUD, lineage, artifacts, audit, curator mutations, divergence audit, provenance, guidance preview, subscriber/SSE inspection. - require_admin(request) — FastAPI dependency; raises 403 unless token payload carries atlasfl-admin. Returns the payload for attribution. - SeedObservationRequest, MemoryPatchRequest, PromoteRequest, DemoteRequest, ForgetRequest, EditRequest, RollbackRequest, LearnRequest — Pydantic bodies for the corresponding mutation endpoints. PromoteRequest carries supersede: bool = False for the description-override / coalescer conflict path. - list_memories(...), get_memory(uid), seed_memory(body), patch_memory(uid, body), forget_memory(uid) — memory CRUD; seeds flow through normal ingest. - get_lineage(...) — cross-trace lineage merging live AttributionRegistry with persisted apollo_lineage_events; entries tagged live | persisted | live+persisted. - get_capped_lineage(artifact_id, service=None, limit=500)GET /lineage/capped: traces where the per-type attach cap held the artifact back (Layer 1 visibility). - get_artifact_stats(artifact_id, since=None)GET /artifacts/{artifact_id}/stats: per-artifact attached / capped aggregate from apollo_lineage_events. - list_artifacts() — combined active artifacts (M9 persisted) and pending synthesis proposals (M8 in-memory). - promote_artifact(artifact_id, body), demote_artifact(...), forget_artifact(...), edit_artifact(...), rollback_artifact(...) — Curator mutation endpoints; translate CuratorPolicyViolation to 403, DescriptionOverrideConflict to 409, ValueError to 4xx. - trigger_learn(body) — admin-initiated synthesis pass (POST /learn); 202-accepts and runs the LLM on a background task. - list_recommendations() — pending Evaluator demotion recommendations (M10). - list_audit(...) — Curator audit log with action/actor/artifact/trigger/type filters. - get_provenance(artifact_id, ...) — trace an artifact back to audit chain, source proposal, and contributing observations (handles real-trace, sweep:*, and admin triggers). - list_divergence(...) — observations where caller_identity.username differs from emitted_by.token_subject; for audit support of Invariant 17. - preview_guidance(scope) — preview the L1 or L3: attachable artifact set right now. - list_subscribers(), guidance_stream(scope) — admin-SSE inspection and live feed of Curator commits.

apollo/admin/sse.py — Process-wide SSE hub for fanning Curator commits to admin clients in real time. M7 ships the pipe empty (no Curator commits yet); synthetic broadcasts are enough to prove wiring. - SSEHub — singleton-by-convention; subscribe(scope, username), unsubscribe(sub), broadcast(event, scope=None) (drops on slow consumer queue-full), subscribers_snapshot(), shutdown() (enqueues a sentinel), reset() (test helper). - sse_event_stream(sub) — async generator that serializes queued events as text/event-stream bytes with retry: preamble and 15s keepalive comments.

apollo/chat/

apollo/chat/__init__.py — Package marker for Apollo's admin-chat surface (M7 read-only, M11 action tools). No re-exports.

apollo/chat/conversation.py — Redis-backed admin-chat history keyed by (username, conversation_id). Distinct from oracle's user conversation store; falls back to an in-memory dict when Redis is unreachable. - AdminConversationStoreget(username, conversation_id), append(username, conversation_id, role, content) (trims to max turns), reset(username, conversation_id). Constructor accepts an injected client or uses cached health-checked one.

apollo/chat/explain.py — Three read-only helpers (M11) that surface audit records in a shape the admin-chat LLM consumes as tool_result. - explain_decision(audit_id|artifact_id|trace_id, actor) — return matching audit record(s) plus best-effort evidence_ref resolution (observations by trace_id, target version, etc.). - list_decisions(action, actor, since, limit, caller) — chronology of recent Curator actions; returns summary fields only. - discuss_decision(audit_id|artifact_id, actor) — load full context bundle (audit + current artifact + upstream artifacts) for a focused multi-turn thread.

apollo/chat/server.py — Admin-chat REST endpoints (M11) + §6.2 SSE streaming. POST /chat dispatches on action (list_tools | invoke | chat); POST /chat/stream is the streaming variant. LLM drives tool selection via OpenAI-native tool-calling. - ChatRequest, ChatStreamRequest — Pydantic bodies. - admin_chat(body) — entry point; dispatches on action. - admin_chat_stream(body) — streaming variant; forwards _chat_loop_events() output as SSE frames with disabled proxy buffering.

apollo/chat/tools.py — Admin-chat tool catalog (M11). Defines TOOL_DEFINITIONS metadata, TOOL_IMPLEMENTATIONS dispatch table, and the OpenAI-tools converter. Every tool takes an actor kwarg for audit attribution. - TOOL_DEFINITIONS — list of tool dicts (name, description, parameters, mutating flag) for read tools (list_memories, get_memory, list_decisions, explain_decision, discuss_decision, lineage) and mutation tools (forget_memory, promote/demote/forget/edit/rollback_artifact, rollback_graph, trigger_synthesis, pause/resume_curator). - to_openai_tools(definitions=None) — convert Apollo's TOOL_DEFINITIONS to OpenAI's tools=[...] JSON-Schema shape. - list_memories, get_memory, list_decisions_tool, explain_decision_tool, discuss_decision_tool, lineage — read-tool implementations. - forget_memory, promote_artifact, demote_artifact, rollback_artifact, forget_artifact, edit_artifact, rollback_graph, trigger_synthesis, pause_curator, resume_curator — mutation tool wrappers over oracle.curator actions. - TOOL_IMPLEMENTATIONS — name → callable dispatch table consumed by chat/server.py.

apollo/curator/

apollo/curator/__init__.py — Package marker (M9); re-exports promote, demote, forget, edit, rollback, ApolloAuditRecord, write_audit, CuratorPolicyViolation, CuratorPaused, ActionKind, ActionRequest, allow_or_raise, is_paused, raise_if_paused, set_paused, clear_paused.

apollo/curator/actions.py — The five Curator mutation verbs (M9). Each is a policy gate → history write → store mutation → audit write → SSE broadcast atomic sequence; per-stage failures bump CURATOR_ATOMIC_FAILURES_TOTAL and unwind partial work. - DescriptionOverrideConflict(Exception) — raised by promote() when another active artifact already overrides the same (service, tool) description; admin API maps to 409. - ActionResult — dataclass returned by every action; carries similar_artifacts: list[dict] for the Layer 6-B promote advisory; to_dict() flattens for HTTP responses. - promote(artifact_id, proposal, actor, rationale, trigger, evidence_ref, admin_note, supersede=False) — lift an approved proposal into the active set as version 1 (or N+1 if prior exists). When supersede=True, atomically demotes both description-override conflicts and any proposal.supersedes: [...] coalescer cluster members (Layer 6-C). - demote(artifact_id, actor, rationale, ...) — hide from guidance without deleting; sets status="demoted" and bumps version. Drops any matching evaluator recommendation. - forget(artifact_id, actor, rationale, ...) — delete current artifact; writes indefinite: true audit so the action is never purged. - edit(artifact_id, actor, rationale, content_patch, applicability_patch, ...) — patch metadata or content; bumps version; requires at least one patch. - rollback(artifact_id, target_version, actor, rationale, ...) — restore prior version's content as a new version with prev_version_id pointing at the pre-rollback record. Indefinite audit. - _find_description_override_conflicts(*, artifact_id, applicability, content) — return active artifacts that would shadow this promote on the receiver's get_tool_description_overrides() path; gate only triggers when content.description_override and applicability.tool_name are both set. - _embed_and_find_similar(*, artifact_id, artifact_type, content, applicability) — Layer 6: compute the new artifact's embedding and return (embedding, similar_artifacts); never raises. - _load_active_set_for_similarity(artifact_type) — pull every status=active artifact of one type for the similarity scan.

apollo/curator/audit.py — Curator audit-log model and writer (M9). Schema per §Curator → Audit log; rationale is required non-empty. Curator audit-log model and writer (M9). Schema per SPEC-14 §Audit log; rationale is required non-empty. - ApolloAuditRecord(BaseModel) — fields per SPEC-14; field validators enforce non-empty rationale and actor. as_document() computes expires_ts (null when indefinite=True, else +APOLLO_AUDIT_RETENTION_DAYS). - write_audit(record, store=None) — persist a record, emit the canonical oracle.curator.audit log line, return the record_id.

apollo/curator/auto.py — Autonomous Curator background driver (M12). Sweeps the synthesis pending list (evolution-class proposals) and the evaluator recommendation queue (demote/fast_demote) and commits them with actor="curator_auto". Drift-class items stay for admin review. - derive_artifact_id(proposal) — deterministic typed-prefix-plus-SHA256 id (fp_*, ip_*, ps_*, sf_*, art_*) so repeated proposals for the same logical artifact converge on the same id. - sweep_once() — one autonomous-commit pass; short-circuits with ran=False when disabled or paused.

apollo/curator/pause.py — Process-wide pause flag (M11) freezing every Curator mutation. State is non-persistent by design. - PauseState — dataclass with snapshot(). - CuratorPaused(Exception) — raised by mutations when the flag is on; carries the pause state for 409 surfaces. - is_paused(), snapshot(), raise_if_paused() — pause-state accessors. - set_paused(actor, rationale, admin_note=None) — flip on, write indefinite audit, broadcast SSE; idempotent. - clear_paused(actor, rationale, admin_note=None) — flip off, write indefinite audit, broadcast SSE; idempotent. - reset() — test helper.

apollo/curator/policy.py — Hard-invariant policy gate (M9). Every Curator action calls allow_or_raise(...) first; violations raise CuratorPolicyViolation and increment CURATOR_POLICY_VIOLATIONS_TOTAL. Keys on action shape, not artifact content. - CuratorPolicyViolation(Exception) — carries rule id and human-readable detail. - ActionKind(Enum) — full set of action verbs (promote/demote/forget/edit/rollback/compact + M11 pause/resume/trigger_synthesis/graph_rollback). - ActionRequest — dataclass normalized for the gate (kind, actor, artifact_id, optional patches and target_version). - allow_or_raise(request) — sole public entry; runs every rule check and increments the violations counter on raise.

apollo/evaluator/

apollo/evaluator/__init__.py — Package marker (M10); re-exports AttributionRegistry, cascade, recommendations, scoring, and signals public surface. M10 ships the Evaluator as an advisor (writes recommendations; M12 flips autonomous commit).

apollo/evaluator/attribution.py — Trace → applied-artifact-ids registry. Oracle's attacher records every attachment here at dispatch time so the evaluator can attribute signals back to the right artifacts. - AttributionRegistryrecord(trace_id, scope, artifact_ids), applied_for(trace_id, service_name=None), traces_with_artifact(artifact_id, service_name=None), prune(now=None) (TTLs entries via APOLLO_GRAPH_TRACE_STATE_TTL_SEC), snapshot(), reset(). - get(), reset() — module-level default-instance accessors.

apollo/evaluator/cascade.py — Upstream flagging plus DriftEvent vs silent-demote decision logic. Pure function — translates score state into a recommendation outcome. - CascadeOutcome — dataclass (artifact_id, action, reason, upstream_flagged). - cascade_on_l3_dominant(engine, artifact_id, upstream_ids=None) — pick drift_event (≥3 L3 signals in window), recommend_fast_demote (N=2 L3-dominant ticks), recommend_demote (N=5 sub-threshold ticks), or none. Flags upstream on every non-none branch.

apollo/evaluator/persist.py — Layer 4-A — writes evaluator scores back to apollo_artifacts so the attach sort key can read them. Fire-and-forget via the event loop, kill-switched by APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED, and pytest-safe. - persist_score_to_artifact(artifact_id, score, decomposition) — schedule an ES update via a Painless script so the existing type-specific content fields survive.

apollo/evaluator/recommendations.py — Per-artifact pending demotion recommendation queue surfaced on GET /recommendations. Replace-semantics — latest recommendation per artifact wins; admin acts via the M9 demote endpoint. - Recommendation — dataclass (artifact_id, kind, reason, evaluator_score, score_decomposition, upstream_artifact_ids, created_at); to_dict(). - RecommendationQueueadd(rec), remove(artifact_id), get(artifact_id), snapshot(), reset(), __len__. - get_queue(), reset_queue() — module-level singleton accessors.

apollo/evaluator/scoring.py — Per-artifact rolling EMA score with preserved per-signal decomposition. Weight tiers L3-error 3.0 / schema-mismatch 3.0 / user-feedback 1.5 / evaluator-confidence 0.5; score < 0.5 triggers demotion cadence. - SignalKind(Enum) — L3_ERROR / SCHEMA_MISMATCH / USER_FEEDBACK / EVALUATOR_CONFIDENCE. - ArtifactScore — dataclass holding the rolling score, signal counts/magnitudes, tick counters, L3-dominant counter, recent L3 timestamps; decomposition() returns audit-ready dict. - ScoringEnginescore_for(artifact_id), snapshot(), apply_signal(artifact_id, signal_kind, magnitude, now=None) (pure EMA math), repeated_l3_failures_in_window(artifact_id, now=None), reset(). - get_engine(), reset_engine() — module-level singleton accessors.

apollo/evaluator/signals.py — Failure-signal detection: maps one observation to zero-or-more SignalHits per applied artifact. Schema-mismatch fires only when the observed service is component_kind == "agent". - SignalHit — frozen dataclass (artifact_id, signal_kind, magnitude, source_trace_id, source_event_type). - detect_signals(envelope, applied_artifact_ids, intent_schema_for_trace=None, confidence_gap=None) — dispatch on event_type and produce signal hits; returns [] when no artifacts attributed.

apollo/guidance/

apollo/guidance/__init__.py — Package marker; module docstring describes the two entry points (attacher in-process helpers and api admin REST inspection). No re-exports.

apollo/guidance/api.py — Apollo's public REST entry points: POST /observations (L3 ingest) and GET /stats (metric snapshot). - post_observations(request) — accept a validated ObservationBatch; stamps caller_identity from the Bearer token when emitter didn't, always overwrites emitted_by server-side, returns 202 with accepted/dropped counts. - get_stats() — JSON snapshot of every registered metric plus M13 additions (degraded_emitters, intent_schema_coverage) and a guidance_health block with per-scope success / null-by-reason breakdown and null-rate. - _guidance_health(metrics) — derive the at-a-glance per-scope attach health block from the flat metrics snapshot.

apollo/guidance/attacher.py — In-process attach helpers oracle calls when composing outbound envelopes. Bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS; failures and timeouts return None so the request still succeeds without guidance. - load_active_artifacts_from_es() — read every status=active artifact from apollo_artifacts and project to ApolloArtifact; cached for _ACTIVE_SET_TTL_SEC (5s); skipped under pytest. - set_active_set_source(src), reset_active_set_source() — install/restore the active-set provider hook (M8 swaps in the Curator-backed source). - for_l1(user, intent_class, caller_tags, trace_id) — build apollo_guidance for an L1 /chat response. - for_l2(user, intent_class, caller_tags, trace_id) — build the L2 in-process payload that oracle's tool-executor folds into its prompt (M15). - for_l3_agent(service_name, tool_name, intent_class, caller_tags, trace_id) — build the payload injected into an MCP dispatch's arguments.apollo_guidance. - _cap_for_type(artifact_type) — return the configured per-attach cap for one artifact type, or None if uncapped. - _safe_float(value, default) — coerce to float with a defensive fallback; lets _sort_key tolerate missing ranking signals. - _sort_key(artifact) — five-tier priority chain consulted at cap time and read by the receiver: evaluator_score → confidence → applicability specificity → weight → as_of. - _apply_attach_caps(matched, *, scope_label) — sort each type's matches by _sort_key and keep the top-N; returns (kept, dropped_pairs) where dropped_pairs is [(artifact_id, artifact_type), …] for held-back artifacts. - _summarize(artifacts, capped_pairs=None) — emit a per-type summary string ("N type (id1,id2,…+M) +C capped (cid1,…)") that diffs cleanly across calls.

apollo/guidance/selectors.py — The artifact-applicability matcher. Filters an active set to those whose Applicability matches the caller context. - match_artifacts(active_set, layer, intent_class, service_name, tool_name, caller_tags) — return artifacts whose type is in ATTACHABLE_TYPES and whose applicability fields match the caller; admin/audit-only types never leak.

apollo/hooks/

apollo/hooks/__init__.py — Package marker; oracle-side in-process emission hooks. Apollo is a package inside oracle (§Package Structure), so oracle emits via direct calls to these helpers rather than HTTP. No re-exports.

apollo/hooks/chat.py — Emit helpers oracle's REST and LLM paths call to feed Apollo. Every helper is fire-and-accept; failures log and never raise into oracle's request path. - emit_intent_schema(intent_schema, conversation_id, token_payload, trace_id=None) — emit intent_schema for an inbound L1 chat turn with an intent block. - emit_user_prompt(prompt, conversation_id, token_payload, trace_id=None, intent_class=None) — emit user_prompt for an inbound L1 chat turn. - emit_llm_turn(trace_id, conversation_id, token_payload, request_messages, response_content, model, ...) — emit llm_turn for each oracle LLM cycle (L2-only). - emit_tool_output(trace_id, conversation_id, token_payload, service_name, tool_name, arguments, output, latency_ms=None) — emit tool_output after a successful MCP dispatch (oracle observes on L3's behalf); strips injected apollo_guidance from arguments. - emit_tool_error(trace_id, conversation_id, token_payload, service_name, tool_name, arguments, error_message, ...) — emit tool_error after a failed MCP dispatch. - emit_final_response(trace_id, conversation_id, token_payload, response, ...) — emit final_response just before oracle returns to L1.

apollo/learner/

apollo/learner/__init__.py — Package marker; extractors update Decision Graphs deterministically on every observation (M2); synthesis LLM runs event-driven (M8). No re-exports.

apollo/learner/coalescer.py — Layer 6-C — periodic background loop that finds similarity clusters of active artifacts and queues LLM-merged proposals on apollo_proposals. Off by default (APOLLO_COALESCER_ENABLED); bounded by APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN. - run_periodic(stop_event) — async loop; honors kill switch + APOLLO_COALESCER_INTERVAL_SEC. - run_sweep_once() — one pass; returns {clusters_found, proposals_emitted, skipped}. Never raises.

apollo/learner/drift.py — Graph-anchor drift check (M8). Four deterministic sub-checks gate every LLM-produced proposal before it becomes eligible for Curator commit. - CheckResult, DriftCheckResult — dataclasses; DriftCheckResult.approved is True iff all sub-checks passed; as_drift_event() shapes a DriftEvent body. - check_proposed_pattern_vs_edges(proposal, outcome_graph_edges) — proposed FailurePattern must match an observed error edge. - check_intent_classification_vs_clusters(proposal, intent_graph_nodes) — proposed IntentPattern must reference an existing intent cluster. - check_weight_swings(proposal, existing_weights, z_threshold=None) — proposed weight must lie within APOLLO_DRIFT_Z_SCORE stdevs of prior distribution. - check_trajectory_coherence(proposal, trajectory, tolerance=None) — proposed weight's direction of change must align with the reference EWMA trajectory. - run_all(proposal_id, proposal, outcome_graph_edges=None, ...) — aggregate every sub-check into a DriftCheckResult.

apollo/learner/extractors.py — Rule-based observation → Decision Graph mutations. Deterministic; one entry per event type. Service-namespaces every naturally-service-scoped label. - apply(envelope, graph_set) — dispatch the envelope across every extractor; per-extractor exceptions logged and swallowed so a buggy extractor can't block the others.

apollo/learner/graphs.py — In-memory state of the five Decision Graphs plus per-trace scratchpad. Idempotent per-trace — re-observing the same trace_id is a no-op against the same node/edge. - Module constants INTENT_TOOL_GRAPH, PROMPT_SHAPE_GRAPH, SERVICE_ROUTING_GRAPH, OUTCOME_GRAPH, ITERATION_GRAPH, ALL_GRAPH_IDS. - Node, Edge — pure dataclasses with EWMA weights on edges and outcome distributions on nodes. - make_node_id(graph_id, kind, label), make_edge_id(graph_id, source_id, target_id) — deterministic SHA1-based ids. - DecisionGraphupsert_node(kind, label, trace_id, at, outcome=None, tags=None), upsert_edge(source_id, target_id, trace_id, at, success), drain_dirty() (clears the dirty sets), reset(). - GraphSet — owns the five graphs and the per-trace scratchpad; graph(graph_id), all_nodes(), all_edges(), trace_scratch(trace_id) (lazy TTL eviction), drain_all_dirty(), load_from_records(nodes, edges), reset().

apollo/learner/prompts.py — Prompt templates for the synthesis LLM (M8). One build_*_prompt per flavor; every template demands strict JSON output with a documented schema. Every per-type schema requires a top-level confidence: 0.0..1.0 field; the shared _SHARED_RULES block documents its semantics. - build_failure_pattern_prompt(observations, subgraph) — propose a FailurePattern from tool_error observations. - build_intent_pattern_prompt(observations, subgraph) — propose an IntentPattern from L1 prompts/intent schemas. - build_prompt_shim_prompt(intent_class, pain_points, subgraph) — propose a PromptShim that improves agent prompts for a given intent. - build_sweep_prompt(service, intent_class, observations, active_artifacts, subgraph) — continuous-sweep prompt that may return NoProposal when no useful signal. - build_coalesce_prompt(*, artifact_type, cluster) — Layer 6-C merger prompt: ask the LLM to write a single replacement artifact covering every cluster member's intent without redundancy.

apollo/learner/similarity.py — Layer 6-A + 6-B — artifact embedding + cosine similarity helpers. Pluggable embedder (defaults to axonis.memory.embedder); gracefully degrades to no embedding when sentence-transformers is unavailable. - set_embedder(fn), reset_embedder(), get_embedder() — pluggable embedder interface for tests / production. - text_for_embedding(content, artifact_type) — type-aware text extraction (PromptShim / FailurePattern / IntentPattern / ToolPairingHint / SpecFragment / ServiceConnectionHint). - compute_embedding(content, artifact_type) — embed; returns None when text is empty or the embedder is unavailable. - cosine_similarity(a, b) — pure-Python cosine; safe on empty / zero-magnitude / mismatched inputs. - find_similar_active_artifacts(*, proposal_embedding, proposal_type, proposal_applicability, active_set, threshold=None, self_artifact_id=None) — scope-filtered similarity scan, returned sorted desc by similarity.

apollo/learner/snapshots.py — Hourly graph-state snapshots (§Learner → Snapshots and trajectory). Snapshots are the substrate for past-vs-current comparison and admin graph rollback. - service_from_label(label) — recover the emitter service from a namespaced graph-node label; returns "_all" for unprefixed/universal labels. - set_snapshot_writer(writer) — install a pluggable persistence writer (tests skip ES). - build_snapshot(graph, at, tier="hourly") — capture one graph's full in-memory state as a snapshot document. - snapshot_once(graph_set) — build + persist one snapshot per graph; emits an audit line. - rollback_to_snapshot(graph_id, snapshot_id) — restore prior graph state from a snapshot; M11 minimal implementation, returns bool. - run_periodic(graph_set, stop_event) — long-running coroutine; snapshots every APOLLO_GRAPH_SNAPSHOT_INTERVAL seconds with prompt-shutdown bounded waits.

apollo/learner/synthesis.py — Event-driven synthesis dispatcher (M8). Bridges observations to artifact proposals via LLM calls, coalesced per-trace, bounded by a semaphore, and gated by the drift check before reaching the pending list. - ProposalRecord — dataclass for one synthesis outcome (approved or drift_flagged); to_public(). - SynthesisEngine — singleton; set_graph_getter(getter), schedule(envelope) (event-driven entry), schedule_admin_initiated(scope) (POST /learn entry), pending_snapshot() (merges ES + in-memory), clear_pending(), remove_pending(proposal_id), run_sweep_once() (continuous sweep tick). - run_sweep_periodic(stop_event) — background loop driving run_sweep_once() on APOLLO_SYNTHESIS_SWEEP_INTERVAL_SEC cadence; gated by APOLLO_SYNTHESIS_SWEEP_ENABLED. - _NEUTRAL_CONFIDENCE = 0.5 — module-level neutral default; applied when an LLM proposal is missing or has an unparseable confidence. - _normalize_confidence(proposal) — coerce and clamp proposal['confidence'] into [0.0, 1.0]; mutates in place; called from _record_proposal.

apollo/learner/trajectory.py — Per-edge short-vs-long EWMA divergence projection — the primary drift signal. Pure math over the in-memory graph. - EdgeTrajectory — dataclass (edge_id, source/target ids, weight_short/long, divergence, count). - project(graph) — return a per-edge trajectory list sorted by abs(divergence) descending.

apollo/lineage/

apollo/lineage/__init__.py — Package marker (§7.3); re-exports persist_attach, persist_capped, aggregate_artifact_stats, query_capped_for_artifact, query_traces_with_artifact, query_trace_attribution, _persistence_disabled. Durability layer for cross-trace attribution complementing the in-memory registry.

apollo/lineage/persist.py — Schedules fire-and-forget ES writes of attach + cap events to apollo_lineage_events. Stays off the attach latency budget; no-ops under pytest or when no event loop is running. - persist_attach(trace_id, scope, artifact_ids) — schedule one denormalized row per (trace_id, scope, artifact_id) with kind: "attached"; idempotent via deterministic uid {trace_id}:{scope}:{aid}. - persist_capped(*, trace_id, scope, capped) — schedule one row per (artifact_id, artifact_type) pair with kind: "capped" and capped: uid-prefix so it coexists with attached rows on the same triple.

apollo/lineage/queries.py — Retroactive lineage reads against apollo_lineage_events. The admin /lineage endpoint merges these with the in-memory AttributionRegistry. The attached-only queries filter out kind: "capped" rows so "applied" semantics are preserved. - query_traces_with_artifact(artifact_id, service_name=None, limit=500) — every persisted trace where the artifact was applied (excludes capped); deduplicates scopes per trace. - query_capped_for_artifact(artifact_id, *, service_name=None, limit=500) — every persisted trace where the cap held the artifact back; same shape as the attached query but kind=capped only. - query_trace_attribution(trace_id, service_name=None, limit=500) — full persisted attribution for one trace (excludes capped); returns None when no rows. - aggregate_artifact_stats(artifact_id, *, since=None, limit=1000){attached_count, capped_count, last_attached_at, last_capped_at} aggregate; powers the GET /artifacts/{id}/stats admin endpoint.

apollo/memory/

apollo/memory/__init__.py — Package marker; Elastic UDS-backed storage for observations, artifacts, graphs, and audit. No re-exports.

apollo/memory/bootstrap.py — Idempotent index bootstrap. ES auto-creates on first write but reads against a missing index 404; this runs at startup so reads on a fresh cluster don't 404. - ensure_indices() — create every Apollo index from apollo/templates/* that doesn't yet exist; returns {alias: status}. Handles multi-worker race via resource_already_exists_exception swallowing.

apollo/memory/queries.py — Bypasses for UDS.read (UDS routes through an on-disk template Apollo doesn't ship). Goes straight to the underlying ES client; falls through to store.read(...) for in-memory test fakes. - scan_all(store, size=10000) — enumerate every doc; 404 → empty dict. - get_by_id(store, uid) — single-doc fetch; returns {uid: doc} or {}. - search_by_query(store, query, size=1000) — filtered ES-shape query; 404 → empty dict.

apollo/memory/store.py — UDS subclasses for every Apollo Elastic index. Each owns a single alias registered in axonis-core's schema (except ApolloProposals, which is hardcoded to avoid a schema bump). - ApolloObservations, ApolloGraphNodes, ApolloGraphEdges, ApolloGraphSnapshots — observation and decision-graph stores (M1, M2). - ApolloArtifacts, ApolloArtifactHistory, ApolloAudit — Curator stores (M9). Artifacts holds current versions; history holds prior versions indefinitely; audit defaults to APOLLO_AUDIT_RETENTION_DAYS with indefinite=true records exempt. - ApolloLineageEvents — denormalized cross-trace attribution events (§7.3). - ApolloProposals — persistent synthesis pending list (M8) so multi-worker uvicorn deployments don't strand proposals per-worker. Overrides _index to avoid KeyError.

apollo/observer/

apollo/observer/__init__.py — Package marker; observation normalization and ingest entry point. No re-exports.

apollo/observer/events.py — Observation envelope and typed event-payload models. §Observation Model; token-level events are rejected — only turn-boundary/tool-invocation/error/final-response events accepted. - EventType(Enum) — full enum (must match §Observation Model → Event types exactly). - IntentSchemaPayload, UserPromptPayload, LLMTurnPayload, ToolOutputPayload, ToolErrorPayload, FinalResponsePayload, UserFeedbackPayload — per-event-type payload models (extra=allow). - CallerIdentity — minimal identity from token payload (username, roles, service); records "who the work is attributed to." - EmittedBy — server-stamped attribution of "who pushed the bytes" (token_subject, token_roles, context); auditors flag divergence from caller_identity.username. - ObservationEnvelope — unified envelope with field-validator that re-validates payload against the event-type model. - ObservationBatch — HTTP POST body shape for /observations.

apollo/observer/ingest.py — Async-queue-backed ingest loop (M1). Both in-process and HTTP intake converge here; workers drain the queue, dedup, write to ES, update graphs, and trigger synthesis + evaluator. - Module-level graph_set: GraphSet — the in-memory mirror of the five Decision Graphs. - startup(), shutdown() — lifecycle entry/exit called from oracle's Starlette lifespan. - reset_state() — testing helper. - set_writer(writer), set_graph_writer(writer) — install test writers. - ingest(envelope) — public entry point; put_nowait onto the bounded queue, returns False on queue-full; never blocks emitter tasks.

apollo/sdk/

apollo/sdk/__init__.py — Compat shim re-exporting the canonical SDK from axonis.apollo (ApolloClient, ApolloGuidanceCache, ApolloIntegration, ApolloMCPMiddleware, current_guidance). M5 moved the SDK into axonis-core; this re-export keeps oracle-internal imports working unchanged.

Other Apollo files in oracle/

server/middleware/trace.py — W3C traceparent ingress middleware (M6). Reads / mints / validates traceparent on every inbound request, installs it on the ambient ContextVar so downstream emitters and outbound dispatches propagate the same trace-id. Skips /health and /service-info. - TraceparentMiddleware — ASGI middleware; constructor takes optional header_name override; in best-effort mode mints replacements and increments missing/malformed counters; in required mode rejects with 400.

server/llm/apollo_cache.py — Request-scoped Apollo guidance cache for oracle's chat LLM (L2). Oracle is a guidance subscriber for its own chat LLM, same as L1/L3; isolated per-request via ContextVar. - get_cache() — return the current request's ApolloGuidanceCache, or the empty sentinel. - set_cache(cache) — install a cache; returns a ContextVar token. - reset_cache(token) — restore the prior contextvar binding; idempotent. - populate_for_turn(user, intent_class, caller_tags, trace_id) — build a cache from the L2 attacher and install it; swallow-on-failure, returns the token.

Insomnia Test Flow

A literal, step-by-step protocol for exercising every Apollo HTTP endpoint via the Insomnia collection (developers-environment/Insomnia/APOLLO-API.yaml + AXONIS-Oracle.yaml for the /chat step). Doubles as the canonical request/response-contract reference for each endpoint: per step it pins the verb, URL, body/params, and the response shape to verify.

Format per step. Each numbered step lists exactly one request: which folder/name to fire, body / params (verbatim), and what to verify in the response.

Coverage. Every Apollo HTTP endpoint is exercised at least once. The full coverage table is at the end.

Setup

Pick a sub-environment in Insomnia (localhost - test, development - test, etc.). OAuth2 auto-applies the Bearer token to every request below.

Three env variables propagate across steps — update in the active sub-env when prompted:

Variable Set in step Read in steps
demo_memory_uid 4 5, 7, 8
demo_artifact_id 12 13, 14, 15, 16, 17, 19, 32, 33, 44
demo_longevity_prefix 36 37, 43

Local longevity simulation (steps 36, 37, 43). The longevity-seed / verify / cleanup endpoints are gated behind APOLLO_ALLOW_LONGEVITY_SEED=true on the oracle process. The flag is exported by developers-environment/oracle/apollo.env, so sourcing the standard dev env stack already enables it:

source ../developers-environment/conf/development.axonis.ai.env
source ../developers-environment/oracle/oracle.env
source ../developers-environment/oracle/apollo.env
uv run python -m server

Override to false (or unset the var) on shared / production clusters — both endpoints return 403 with that hint in detail when the flag is off.

Steps 1–10 — stats, ingest, memory CRUD, learn

1. Stats → StatsGET /api/v1/apollo/stats. Verify: 200; body status == "ok"; metrics block present; maintenance.last_run_at present.

2. Observation Ingest → Post Observations (batch)POST /api/v1/apollo/observations. Body (replace default to seed both event types):

{
  "observations": [
    {"event_type": "tool_output", "trace_id": "trc_demo_0001", "service": "cortex",
     "timestamp": "2026-05-12T15:00:00Z",
     "payload": {"tool_name": "summarize", "latency_ms": 312.4, "output_size_bytes": 1024}},
    {"event_type": "tool_error", "trace_id": "trc_demo_0002", "service": "cortex",
     "timestamp": "2026-05-12T15:00:01Z",
     "payload": {"tool_name": "summarize", "error_class": "ValidationError",
                 "error_message": "schema mismatch on input.cohort_id"}}
  ]
}

Verify: 202; body {"accepted": 2, "dropped": 0}.

3. Memories → List MemoriesGET /api/v1/apollo/memories?service=cortex&limit=5. Verify: 200; both records from step 2 in observations[]; count >= 2.

4. Capture demo_memory_uid — Copy any observations[i]._id from the step 3 response into the sub-environment's demo_memory_uid.

5. Memories → Get MemoryGET /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Verify: 200; same record as in step 3.

6. Memories → Seed MemoryPOST /api/v1/apollo/memories. Body:

{"event_type": "tool_output", "trace_id": "trc_admin_seed_0001", "service": "oracle",
 "timestamp": null, "payload": {"tool_name": "synthetic_seed", "output_size_bytes": 0}}

Verify: 201; body {"accepted": true, "trace_id": "trc_admin_seed_0001", "event_type": "tool_output"}.

7. Memories → Patch MemoryPATCH /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Body:

{"tags": ["under-review", "seeded-by-admin"], "admin_note": "Flagged during flow test."}

Verify: 200.

8. Memories → Delete MemoryDELETE /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Verify: 200; body {"forgotten": true, "uid": "<demo_memory_uid>"}.

9. Memories → List Memories — attribution filtersGET /api/v1/apollo/memories?caller_username=test@axonis.ai&limit=10. Enable the caller_username query param. Verify: 200; every observation's caller_identity.username == "test@axonis.ai".

10. Learn → Trigger SynthesisPOST /api/v1/apollo/learn. Body:

{"intent_class": "entity_resolution", "service_name": "cortex",
 "note": "Probing the recent tool_error burst on cortex/summarize."}

Verify: 202; body {"accepted": true, "scope": {...}}. Wait ~2 seconds for the background synthesis task.

Steps 11–20 — artifacts, curator lifecycle, audit, divergence

11. Artifacts → List ArtifactsGET /api/v1/apollo/artifacts. Verify: 200; body {active: [...], pending: [...], count: {active, pending}}; at least one pending entry with status == "approved".

12. Capture demo_artifact_id — Copy any pending[i].id (where pending[i].status == "approved") into demo_artifact_id.

13. Artifacts → Promote ArtifactPOST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/promote. Body:

{"proposal_id": "<paste the same demo_artifact_id value>",
 "rationale": "Flow-test promote.", "admin_note": "step 13"}

Verify: 200; body is an ActionResult: action: "promote", version: 1, non-null audit_record_id, before_version_id: null, after_version_id: "<id>:v1".

14. Artifacts → Edit ArtifactPATCH /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}. Body:

{"rationale": "Tighten applicability to cortex/summarize.",
 "applicability_patch": {"service_name": "cortex", "tool_name": "summarize"}}

Verify: 200; ActionResult with action: "edit", version: 2, before_version_id: "<id>:v1", after_version_id: "<id>:v2".

15. Artifacts → Rollback ArtifactPOST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/rollback. Body:

{"target_version": 1, "rationale": "Flow-test rollback to v1."}

Verify: 200; ActionResult with action: "rollback", version: 3, before_version_id: "<id>:v2", after_version_id: "<id>:v3". Underlying artifact content matches v1.

16. Artifacts → Demote ArtifactPOST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/demote. Body:

{"rationale": "Flow-test demote.", "evaluator_score": null,
 "score_decomposition": null, "upstream_artifact_ids": []}

Verify: 200; ActionResult with action: "demote", version: 4, after_version_id: "<id>:v4". Underlying doc's status flips to "demoted".

17. Recommendations → List RecommendationsGET /api/v1/apollo/recommendations. Verify: 200; body {recommendations: [...], count}. May be empty pre-Evaluator signal.

18. Audit → List Audit RecordsGET /api/v1/apollo/audit?artifact_id={{ _.demo_artifact_id }}&limit=20. Enable the artifact_id filter. Verify: 200; records[] contains four entries in order: promote, edit, rollback, demote, all with actor == "admin:test@axonis.ai".

19. Provenance → Trace Artifact ProvenanceGET /api/v1/apollo/provenance?artifact_id={{ _.demo_artifact_id }}. Verify: 200; body {artifact, audit, proposal, contributing_observations, contributing_services, note}. audit[0].action == "promote". proposal.trigger_event_type is one of user_prompt / tool_error / tool_output / sweep / admin_initiated.

20. Divergence → List DivergenceGET /api/v1/apollo/divergence. Verify: 200; body {records: [...], count, note}. May be empty if no service-on-behalf-of-user emits have occurred yet. note mentions service principals.

Steps 21–31 — lineage, guidance preview, SSE, admin chat, L1 chat

21. Lineage → Lineage by ArtifactGET /api/v1/apollo/lineage?artifact_id={{ _.demo_artifact_id }}&include_observations=true&observations_limit=50. Verify: 200; body {lineage, count, filter, note}. Each entry has source{live, persisted, live+persisted}.

22. Lineage → Lineage by TraceGET /api/v1/apollo/lineage?trace_id=trc_demo_0001&include_observations=true. Verify: 200; one entry with trace_id == "trc_demo_0001", applied field present.

23. Guidance Preview → Preview L1 GuidanceGET /api/v1/apollo/guidance?scope=l1. Verify: 200; body {scope: "l1", guidance: <AttachedGuidance dict or null>}.

24. Guidance Preview → Preview L3 Guidance (cortex)GET /api/v1/apollo/guidance?scope=l3:cortex. Verify: 200; body {scope: "l3:cortex", guidance: <AttachedGuidance dict or null>}.

25. Admin SSE → List SubscribersGET /api/v1/apollo/subscribers. Verify: 200; body {subscribers: [...], count}. Likely empty until step 26 connects.

26. Admin SSE → Subscribe to Guidance StreamGET /api/v1/apollo/guidance/stream?scope=*. Open in a separate tab; Insomnia keeps the SSE response open. Verify: Stream stays connected. Re-running step 25 in another tab now lists this subscriber.

27. Admin Chat → List ToolsPOST /api/v1/apollo/chat. Body: {"action": "list_tools"}. Verify: 200; body {tools: [...], milestone, mode}. Each tool entry carries name, description, parameters, mutating.

28. Admin Chat → Invoke Tool (direct)POST /api/v1/apollo/chat. Body:

{"action": "invoke", "tool": "list_decisions", "arguments": {"action": "demote", "limit": 20}}

Verify: 200; response carries the tool's raw return value (LLM bypassed).

29. Admin Chat → Chat (buffered)POST /api/v1/apollo/chat. Body:

{"action": "chat", "message": "Why was {{ _.demo_artifact_id }} demoted?",
 "conversation_id": "conv-flow-test-1"}

Verify: 200; body {response, tool_calls, iterations, conversation_id, timeout?}. The LLM should pick a tool that surfaces the audit row.

30. Admin Chat → Chat Stream (SSE)POST /api/v1/apollo/chat/stream. Body:

{"message": "List recent demote decisions.", "conversation_id": "conv-flow-test-stream"}

Verify: SSE events fire in order: session_starttool_calltool_resultresponse_delta* → responsedone. error replaces the trailing pair on LLM failure.

31. AXONIS-Oracle → Oracle Gateway → Chat / LLM → ChatPOST /api/v1/chat. Body: default body in collection (message + intent_schema). Verify: 200; response includes apollo_guidance field on the envelope (null when no active artifacts match L1 scope, dict otherwise).

Steps 32–44 — capped lineage, stats aggregate, ops, effectiveness, cleanup

32. Lineage → Capped Lineage by ArtifactGET /api/v1/apollo/lineage/capped?artifact_id={{ _.demo_artifact_id }}&limit=200. Verify: 200; body {artifact_id, service_name_filter, records, count, note}. Each record carries {trace_id, scopes, registered_at}. Likely empty for a freshly-promoted demo artifact — cap pressure builds with real traffic.

33. Artifacts → Artifact Stats AggregateGET /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/stats. Verify: 200; body {artifact_id, attached_count, capped_count, last_attached_at, last_capped_at}. Numbers reflect the last 1000 lineage events for this artifact. Add ?since=2026-05-01T00:00:00Z to narrow the window.

34. Ops → Migrate Lineage MappingPOST /api/v1/apollo/ops/migrate-lineage-mapping?dry_run=true. Verify: 200; body {dry_run: true, index, already_present: [...], would_add: [...]}. Re-fire without dry_run to perform the additive PUT _mapping that adds artifact_type + kind to the existing apollo_lineage_events index. Idempotent. Run this once per environment that pre-dates Layer 6.

35. Ops → Re-embed ArtifactsPOST /api/v1/apollo/ops/reembed-artifacts?dry_run=true&limit=1000. Verify: 200; body {dry_run, processed, embedded, skipped_already_have, skipped_no_text, skipped_no_embedder, errors, updated_ids}. Re-fire with dry_run=false to write content.embedding_vector back to active artifacts that pre-date Layer 6-A. Idempotent. When sentence-transformers isn't installed every artifact lands in skipped_no_embedder.

36. Ops → Longevity SeedPOST /api/v1/apollo/ops/longevity-seed?dry_run=true&days=30&observations=2000&proposals=60&audit=60&lineage=1500&artifacts=40. Prereq: APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {dry_run, run_prefix, since, until, days, observations:{...}, proposals:{...}, audit:{...}, lineage:{attached,capped}, artifacts:{...}, cleanup_hint}. Re-fire with dry_run=false to write the backdated synthetic data. Copy run_prefix into _.demo_longevity_prefix for steps 37 and 43.

37. Ops → Longevity VerifyGET /api/v1/apollo/ops/longevity-verify?run_prefix={{ _.demo_longevity_prefix }}. Prereq: same APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {run_prefix, total_docs, docs_by_index}. docs_by_index totals should equal the planned counts in step 36's response. A mismatch indicates a per-store write failure.

38. Effectiveness → Summary (1d)GET /api/v1/apollo/effectiveness/summary?window=1d. Verify: 200; smallest of the four window snapshots.

39. Effectiveness → Summary (7d)GET /api/v1/apollo/effectiveness/summary?window=7d. Verify: 200; each section's count ≥ step 38's matching count.

40. Effectiveness → Summary (30d)GET /api/v1/apollo/effectiveness/summary?window=30d. Verify: 200; primary read. After step 36's seed, every section shows the seeded counts as a floor (never assert equality, only floor). One section per surface — observations, synthesis, curator, attach, artifacts, evaluator — plus the resolved window, since, until. Supports window=1d|7d|30d|90d, or override with since=ISO8601 (+ optional until).

41. Effectiveness → Summary (90d)GET /api/v1/apollo/effectiveness/summary?window=90d. Verify: 200; widest of the four window snapshots. With a 30-day seed this matches step 40's counts.

42. Effectiveness → Trend (1d/7d/30d/90d)GET /api/v1/apollo/effectiveness/trend?windows=1d,7d,30d,90d. Verify: 200; body {buckets, skipped, as_of, rollups:{1d, 7d, 30d, 90d}}. Each entry in rollups carries the same shape as /effectiveness/summary. Healthy Apollo has observations.total / synthesis.proposals_created / curator.total_actions / attach.attached_total monotonically non-decreasing as windows widen.

43. Ops → Longevity CleanupPOST /api/v1/apollo/ops/longevity-cleanup?run_prefix={{ _.demo_longevity_prefix }}. Prereq: same APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {run_prefix, total_deleted, deleted_by_index}. Re-firing on an already-cleaned prefix returns total_deleted: 0. After cleanup, re-fire step 37 — it should report total_docs: 0.

44. Artifacts → Forget Artifact — cleanupDELETE /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}. Body:

{"rationale": "Flow-test cleanup — irreversible delete."}

Verify: 200; subsequent GET /artifacts no longer surfaces the id in either active or pending.

Coverage check

Every endpoint in apollo/admin/api.py + apollo/guidance/api.py + apollo/chat/server.py is exercised at least once:

Endpoint Step
GET /stats 1
POST /observations 2
GET /memories 3, 9
GET /memories/{uid} 5
POST /memories 6
PATCH /memories/{uid} 7
DELETE /memories/{uid} 8
POST /learn 10
GET /artifacts 11
POST /artifacts/{id}/promote 13
PATCH /artifacts/{id} 14
POST /artifacts/{id}/rollback 15
POST /artifacts/{id}/demote 16
GET /recommendations 17
GET /audit 18
GET /provenance 19
GET /divergence 20
GET /lineage 21, 22
GET /guidance 23, 24
GET /subscribers 25
GET /guidance/stream 26
POST /chat (3 actions) 27, 28, 29
POST /chat/stream 30
POST /api/v1/chat (L1 attach) 31
GET /lineage/capped 32
GET /artifacts/{id}/stats 33
POST /ops/migrate-lineage-mapping 34
POST /ops/reembed-artifacts 35
POST /ops/longevity-seed 36
GET /ops/longevity-verify 37
GET /effectiveness/summary 38, 39, 40, 41
GET /effectiveness/trend 42
POST /ops/longevity-cleanup 43
DELETE /artifacts/{id} 44

Auth-failure smoke check (optional)

For each endpoint above, the OAuth token is auto-applied. To verify the auth gate: in a separate tab, remove the Authorization header and re-fire any admin endpoint. Expect 401 Unauthorized (missing/invalid token) or 403 Forbidden (token lacks atlasfl-admin role).

Failure modes

Symptom Diagnosis
401 on every step Token expired — re-auth via Insomnia's OAuth panel.
403 forbidden on mutations Caller's token lacks atlasfl-admin.
403 curator_paused Curator is paused. POST a resume via step 27/28 first.
400 on step 2 Envelope failed schema validation; detail lists the bad fields.
400 APOLLO_REQUIRE_INTENT_SCHEMA on step 31 Required-mode flip is on (§Q12). Add an intent_schema block to the body.
503 on step 31 No LLM provider configured. Set ANTHROPIC_API_KEY / OPENAI_API_KEY / GROQ_API_KEY.
Empty pending: [] after step 10 Synthesis produced no proposals. Push more observations (step 2) and retry.
404 on step 13 demo_artifact_id doesn't match any pending proposal. Re-run step 11 and re-capture.
409 on step 13 Proposal status isn't approved (likely drift_flagged). Pick a different pending entry.

Future Improvements & Considerations

A living backlog of deferred work, hygiene items, and design considerations that came up during the M0–M14 build but were not addressed immediately. Each entry includes the context, the proposed change, rough effort, and priority so a future engineer can pick an item up without re-discovering why it exists.

Beacon (L1) wiring is deferred — beacon has no HTTP connection to oracle today (MCP_SERVER_URL defaults to cortex direct, see §2.1 below), so attached apollo_guidance has no path into beacon's process. Tracked as item §2.3 below. Parallax was originally a Phase 1 subscriber but is also deferred; its wiring follows the cortex pattern when it onboards.

What this is NOT. This is not a bug list — every item here is either (a) deliberately deferred with a design rationale, (b) hygiene that ships-ready code can tolerate, or (c) an enhancement waiting on production signal. Anything urgent should be an issue, not an entry here.

Cross-references. Items marked "SPEC-OOS" are already tracked in §Implementation Plan → Out of scope; they're restated here with operational framing.

Legend

Priority Meaning
P1 — Soon Would meaningfully improve ops/DX; pick up when a relevant milestone lands nearby
P2 — Watch Not blocking today; flip to P1 if production data shows pressure
P3 — Future phase Tracked for spec completeness; waits on prerequisites outside Apollo's scope
Effort Meaning
S < half day
M 1–2 days
L multi-day

Currently active

Quick-reference of items that have not shipped. Items marked ✅ in the body below are historical and grouped under §Changelog (Layer items in §12).

Item Priority Effort One-line
§2.1 — Beacon default MCP_SERVER_URL bypasses oracle P2 S/M Config docs (S) or change default (M)
§2.3 — Beacon (L1) onboarding to oracle's chat surface P1 M First L1 caller wiring; gates §4.1's required-mode flip
§2.4 — Distribution model for subscriber-facing axonis-core modules P2 S Decide: pip extra vs always-installed
§2.5 — Additional L3 emitter onboarding P2 M Parallax / prism / sentinel after the cortex pattern
§3.1 — Production-grade minimax-local provider P2 mixed Knobs 1-3 shipped; knobs 4-5 deferred
§4.1 — Flip APOLLO_REQUIRE_INTENT_SCHEMA=true P2 S Needs §2.3 (beacon L1) + emitter coverage proof
§4.2 — Flip APOLLO_REQUIRE_TRACEPARENT=true P2 S Same gating as §4.1
§8.1 — Keycloak client-credentials grant P2 M Replace pre-populated AUTHORIZATION with proper grant
§10.1 — Design-journey screenshot / gif P3 S Presentation polish
§11.1 — Absorb oracle's existing memory modules into Apollo P3 Effectively resolved organically (see body)
§12.9 — Cap-defaults empirical study 🚧 In flight Wait for telemetry; not code work

2 · Service integration

2.1 — Beacon's default MCP_SERVER_URL bypasses oracle

Priority: P2 — Watch · Effort: S (config doc); M (change the default).

Context. developers-environment/beacon/beacon.env ships with MCP_SERVER_URL=http://localhost:8000/mcp — pointing at cortex directly. In that configuration beacon never reaches oracle; Apollo sees zero traffic.

For the Apollo scenario to work from beacon (not from curl), the operator must override to http://localhost:8001/agentspace/mcp.

Proposed change. Two options: (1) Doc-only (current path) — §Scenario calls this out; (2) Change the default — flip beacon.env's default to oracle's MCP. Option 2 is the right production choice but a breaking change for anyone running beacon alongside cortex without oracle. Left at option 1 until a production deployment exercises this end-to-end.

Unblocks. Beacon's /chat UI becomes the natural demo surface for Apollo without per-deployment config gymnastics.

2.3 — Beacon (L1) onboarding to oracle's chat surface

Priority: P1 — Soon · Effort: L (architecture decision + implementation).

Direction. Oracle's POST /api/v1/chat is the production user-facing chat surface, driven by oracle's own LLM tool-use loop. It is the L1 surface for any client that wants Apollo guidance applied automatically. Beacon currently streams direct to upstream LLM providers and does not call oracle. Closing the loop on the L1 attach side requires routing beacon through /api/v1/chat — at which point beacon receives apollo_guidance in every response body and feeds its local ApolloGuidanceCache.update(...), exactly as the L1 contract from M3 + M5 specifies.

POST /api/v1/apollo/chat remains a separate admin-scoped surface that runs Apollo's independent MiniMax LLM for talking to Apollo. It is not the L1 path and is not on a deprecation track.

Open implementation details: 1. Beacon's transport. (a) beacon's backend becomes a thin streaming proxy to /api/v1/chat so beacon's existing chat UI keeps working with no UX change; or (b) keep beacon's direct-to-LLM path and have beacon pull guidance via a separate endpoint (GET /api/v1/apollo/guidance?scope=l1, currently admin-only — would need its own role relaxation). (a) is the simpler model. 2. Streaming. /api/v1/chat returns a buffered ChatResponse today; production beacon UX needs streaming. 3. Conversation persistence. Already wired on /api/v1/chat via oracle/server/memory/conversation.py (Redis-backed). Beacon onboarding inherits it for free. 4. Tool catalog visibility. Decide whether tool_calls should be exposed in the streaming response so beacon can render them.

Trigger. First production deployment that needs beacon to consume Apollo guidance via L1 attach.

2.4 — Distribution model for subscriber-facing axonis-core modules

Status:On hold pending trigger. Resolved for cortex: it depends on axonis-core>=0.1.0 directly and imports ApolloGuidanceCache from the canonical post-flatten path (axonis.apollo.guidance_cache).

What's open — neither question has a consumer pushing on it today: 1. Split a thin axonis-sdk sub-package? ApolloGuidanceCache and Spec (formerly LLMSpec) are pure-stdlib. A thin package would let truly-isolated agents depend on a small surface without taking the full axonis-core. 2. Or move axonis-core's heavy deps to optional extras? axonis-core[elastic], axonis-core[transport], etc. Bigger refactor; affects every existing consumer.

Trigger to revisit: parallax or beacon cannot take on axonis-core, exercising Q15's vendoring branch, or a third pure-stdlib SDK module joins the family. Neither has happened.

2.5 — Additional L3 emitter onboarding

Status:Owned by individual service teams — each service declares its own component_kind and (if needed) installs ApolloMCPMiddleware. No Apollo code change required.

Per-service onboarding state (audited 2026-05-12):

Service apollo refs in source Path to onboard
parallax 1 file Closest to ready. Same shape as cortex — install ApolloMCPMiddleware in its __main__.py and declare component_kind="agent".
athena 0 files Library-kind. Declare component_kind="library" — Apollo filters it out of guidance attach. Oracle observes the MCP round-trip and emits on its behalf.
testament 0 files Same as athena — likely library-kind. Owning team confirms.
titan 0 files Same — owning team confirms component_kind.
UDS, rest/fedai-rest n/a Library-kind by design.

Integration paths (unchanged from §Ingest Semantics): - In-process relay (default) — service is reachable from oracle's MCP dispatch. No code change in the service; just declare component_kind. - Direct POST via ApolloClient (fallback) — service runs outside oracle's MCP dispatch reach. Service imports ApolloClient and POSTs to /api/v1/apollo/observations.

Trigger. Each service's owning team picks this up when they're ready. Apollo's side requires no work.

3 · LLM provider hardening

3.1 — Production-grade minimax-local provider

Status: Knobs 1–3 shipped 2026-05-12; knobs 4–5 still deferred (no consumer pushing on them — production uses APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at a hosted MiniMax endpoint; the local provider is a dev / air-gapped-lab fallback). M8 shipped the canonical HF load signature; recent additions are backward-compatible.

# Knob Status Env override / notes
1 Custom model path ✅ Shipped APOLLO_LLM_LOCAL_MODEL_PATH — absolute path on a mounted shared fs. Empty → HF cache default.
2 Thread-pool offload of the HF forward pass ✅ Shipped asyncio.to_thread wraps both complete() and the buffered stream() fallback.
3 Device mapping + quantization knobs ✅ Shipped APOLLO_LLM_LOCAL_DEVICE_MAP, APOLLO_LLM_LOCAL_TORCH_DTYPE, APOLLO_LLM_LOCAL_LOAD_IN_8BIT, APOLLO_LLM_LOCAL_LOAD_IN_4BIT. 4-bit wins if both quantization flags are true.
4 Pre-pull orchestration + readiness gate ⏸ Deferred Block worker start until the ~40GB checkpoint is resident and a warm-up forward pass succeeded. Affects oracle's lifespan.
5 Streaming tokens through LLMClient ⏸ Deferred Required for admin-chat UX. Affects every provider's contract — should be a separate cross-provider change.

Trigger for knobs 4 + 5. An operator commits to an on-prem MiniMax deployment (#4) or admin-chat surfaces a UX need for streaming (#5).

Knob 4 crosses from the provider into oracle's lifespan (gate the ready signal, distinguish "process up" from "model loaded", Kubernetes readinessProbe coordination); the ready-threshold semantics are unspecified without a real consumer. Knob 5 affects every provider's contract — OpenAI/Anthropic stream natively, minimax-local has a buffered single-chunk fallback; a unified streaming abstraction is a cross-provider design exercise. Both punted until there's a real consumer to design against.

4 · Configuration + required-mode flips

4.1 — Flip APOLLO_REQUIRE_INTENT_SCHEMA=true

Status:Awaiting production coverage signal — enforcement code now in place. Phase-1 audit found the env var was declared but no handler read it; fixed 2026-05-12: - New intent_schema: dict | None = None field on ChatRequest. - /chat handler checks the flag at ingress and returns 400 with a referencing error detail when true and the field is missing. - New apollo.hooks.chat.emit_intent_schema(...) helper. When the request carries an intent_schema block, oracle emits the observation before emit_user_prompt so both share the same trace id. - Tests pin all four cases.

Remaining work to flip in production. Watch /stats → intent_schema_coverage ≥ 0.90 for a rolling 7-day window, then set the flag. One env-var change; no Apollo code change.

4.2 — Flip APOLLO_REQUIRE_TRACEPARENT=true

Status:Awaiting emitter coverage signal — enforcement code already in place. Verified 2026-05-12: server/middleware/trace.py:71 reads the flag and returns 400 with a (missing|malformed) detail when required. Best-effort mode (default) mints a replacement and counts apollo_missing_traceparent_total / apollo_malformed_traceparent_total.

Remaining work to flip in production. Watch apollo_missing_traceparent_total rate near zero for a rolling 7-day window, then set the flag.

5 · Maintenance + retention

(No active items — snapshot tier-generation is tracked under §Implementation Plan post-M13 deferred work; the maintenance loop's purge path shipped in M13.)

7 · Observation + audit

7.3 — Persistent attributions for retroactive lineage (✅ Shipped)

Status. Shipped 2026-05-09. apollo_lineage_events Elastic index, apollo/lineage/{persist,queries}.py module, and /lineage extended to merge live (AttributionRegistry) and persisted sources. Entries tagged source: live | persisted | live+persisted. Retention bounded by APOLLO_LINEAGE_RETENTION_DAYS (default 90).

8 · Keycloak + auth

8.1 — Keycloak client-credentials grant for service-to-service auth

Status:Blocked on platform-level Keycloak work (tracked in SPEC-PLATFORM-03 as pending). Apollo's side is already done — the existing emitted_by attribution path (shipped 2026-05-11) correctly handles service-principal tokens: when a request authenticates with a token whose roles includes "service", both caller_identity and emitted_by are stamped from the token without special casing. The /divergence audit endpoint already treats service-role emits as legitimate divergence rather than forging.

What happens when Keycloak's grant lands. An operator does the following with no Apollo code change: 1. Configure Keycloak to issue a client-credentials grant for an apollo-emitter service account. 2. Set APOLLO_SERVICE_TOKEN=<token> on background-worker environments. 3. Workers POST to /api/v1/apollo/observations with Authorization: Bearer $APOLLO_SERVICE_TOKEN. 4. Oracle's OAuthMiddleware validates the token through normal Keycloak introspection. 5. Apollo's ingest handler stamps emitted_by.token_subject = "apollo-emitter@service.axonis.ai" and emitted_by.token_roles = ["service"].

Auditors can then flag any divergence via GET /divergence.

Unblocks. Background observation ingest without user context. Scheduled synthesis jobs. Federation of artifacts.

9 · Test coverage

9.1 — Live deployment integration test (✅ Shipped)

Status. Shipped 2026-05-12 at oracle/tests/integration/test_live_scenario.py. The whole module is pytest.skip-ped unless APOLLO_LIVE_TEST=true. A staging CI job sets the flag plus the APOLLO_LIVE_* env vars to exercise the live path.

Covers: Oracle /health reachable; Apollo /stats returns status: "ok" with a real Bearer token; round-trip POST synthetic observation → assert it appears in /memories?trace_id=… → clean up; /chat returns the apollo_guidance field. Auth uses Keycloak client-credentials first, falls back to password grant (see §8.1).

10 · Spec + docs hygiene

10.1 — Design-journey screenshot / gif for presentations

Status:Requires a recording session, not a code change. Operator runs the §Scenario Step 6 flow (admin chat "promote it") against a live cluster while screen-recording the SSE terminal showing the fan-out, then commits a ~30-second gif + a link from §Design Journey.

10.2 — Full-spec index in specs/ (✅ Shipped)

Status. Shipped 2026-05-12 at oracle/specs/README.md. One-paragraph-per-doc index keyed by audience, plus pointers to the adjacent docs under oracle/docs/.

11 · Memory module consolidation

11.1 — Absorb oracle's existing memory modules into Apollo

Status:Effectively resolved organically. When the original entry was written, three oracle-local modules overlapped with Apollo's surface. Status per module, audited 2026-05-12:

Original module Today
oracle/server/memory/conversation.py Live, but a 12-line re-export shimfrom axonis.memory.store import Store as ConversationStore. The shim survives because oracle's /chat and tests import ConversationStore from the oracle-local path. Replacing it with a direct axonis-core import is mechanical when the rename ripple settles.
oracle/server/memory/cross_service.py Already deleted. No remaining callers. The strict-per-service MemoryService model (Invariant 17, shipped 2026-05-11) + Apollo's cross-service guidance channel cover what this used to do.
oracle/server/models/memory.py Deleted 2026-05-12. The directory had become dead — server/models/__init__.py was importing a removed module, with zero consumers. Removed the whole server/models/ directory.

Trigger for further work. None — what remains is the load-bearing ConversationStore shim, which is intentional (call-site stability across the axonis-core rename).

12 · Attach prioritization

Mostly historical. Items 12.1–12.8 shipped 2026-05-18 (the seven-layer prioritization rebuild). Only §12.9 (cap-defaults empirical study) is still in flight, and that's a data-collection task, not code. The golden-state contract for these layers is §Prioritization Layers.

12.1 — Capped-artifact observability (✅ Shipped)

Status. Shipped 2026-05-18. When the per-type attach cap holds an artifact back, a row lands in apollo_lineage_events with kind: "capped" + artifact_type + scope + trace_id. New queries: query_capped_for_artifact, aggregate_artifact_stats. New admin endpoints: GET /api/v1/apollo/lineage/capped, GET /api/v1/apollo/artifacts/{id}/stats. The pre-existing attached-only queries filter capped rows out by default.

12.2 — Sort key priority chain + per-type caps (✅ Shipped)

Status. Shipped 2026-05-18. apollo.guidance.attacher._sort_key consults evaluator_scoreconfidence → applicability specificity → weightas_of. Mirror key in axonis.apollo.guidance_cache._priority_key. Per-type caps (APOLLO_ATTACH_CAP_*) keep the wire payload bounded; cap drops are deterministic (lowest-weight first) and counted by apollo_guidance_attach_capped_total.

12.3 — Ranking-signal contract pin (✅ Shipped)

Status. Shipped 2026-05-18. _content_from_proposal is documented + tested to preserve evaluator_score, confidence, and weight through promote. TestRankingSignalContract fails the build if any of the three signals is added to the strip list.

12.4 — Evaluator score writeback (✅ Shipped)

Status. Shipped 2026-05-18. apollo/evaluator/persist.py:persist_score_to_artifact writes the in-memory scoring engine's score + decomposition back to content.evaluator_score after every signal application. Fire-and-forget; never blocks the ingest hot path. Kill switch: APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED.

12.5 — Synthesis confidence emission (✅ Shipped)

Status. Shipped 2026-05-18. Every synthesis prompt schema requires confidence: 0.0..1.0; _SHARED_RULES explains the semantics. _normalize_confidence clamps + coerces in _record_proposal.

12.6 — Deepened rationale summary + per-artifact aggregation (✅ Shipped)

Status. Shipped 2026-05-18. _summarize now names attached + capped artifact IDs per type, truncating to top-5 with a +N tail. aggregate_artifact_stats exposes the same data per artifact.

12.7 — Promote-time similarity advisory (✅ Shipped)

Status. Shipped 2026-05-18. apollo/learner/similarity.py reuses axonis.memory.embedder (gated by the [memory] extra). Promote computes the new artifact's embedding, stores it on content.embedding_vector, and scans active artifacts at the same scope for matches above APOLLO_SIMILARITY_THRESHOLD (default 0.9). Returns hits in ActionResult.similar_artifacts — promote still succeeds.

12.8 — Curator-time similarity sweep (✅ Shipped)

Status. Shipped 2026-05-18. apollo/learner/coalescer.py is a fifth background loop. Each tick partitions active artifacts by (type, service, tool), union-finds clusters above APOLLO_COALESCER_THRESHOLD (default 0.85), and calls Apollo's LLM via build_coalesce_prompt to write a merger. Merger lands on apollo_proposals carrying supersedes: [...]. promote() extended to honor the list. Off by default (APOLLO_COALESCER_ENABLED=false).

12.9 — Cap-defaults empirical study (🚧 In flight — needs accumulated data)

Need. The per-type caps (§12.2) and similarity thresholds (§12.7, §12.8) ship with reasonable defaults but no production data behind them. Once §12.1's lineage rows + §12.4's score writebacks have run for a representative period (~weeks), revisit: - Are the per-type caps biting on the right artifacts? apollo_guidance_attach_capped_total{scope, artifact_type} shows distribution. - Should the cap shape be artifact-count or token-budget? - Are similarity thresholds (0.9 promote-time / 0.85 sweep-time) tight enough to prevent over-coalescing, or loose enough to catch duplicates?

Trigger. Not code work — an empirical pass once telemetry is available. Output is a short addendum to §Prioritization Layers recommending any default changes.

Picking items off this list

Suggested triage when considering what to pick up: 1. If a related milestone is re-opening (e.g., a future M14b for beacon's L1 wiring or a milestone for parallax onboarding) — pick P1 items in that subsystem. 2. If production data shows pressure — promote the matching P2 to P1. 3. If an external prerequisite lands — unblock the P3 it gates (Keycloak client-credentials → 8.1; production MiniMax endpoint → 3.1).

No item is urgent. All are tracked here so they don't get rediscovered cold. Maintain by appending items as they're identified; remove items when they're addressed (link the commit in the commit message).


Depends on: component.beacon.workbench, component.cortex.intelligence, component.oracle.gateway, platform.axonis-core, platform.observability, platform.service-contract

Required by: component.beacon.ticketing, component.oracle.gateway