Apollo — System-Wide Observation, Learning, and Guidance Layer
Status: Design
Package: oracle.apollo (lives inside oracle, not exposed as a separate service)
Depends on: platform.axonis-core, platform.service-contract, component.oracle.gateway
Milestone: P3 (after oracle is operational in production)
Purpose
Apollo is the reasoning and memory layer over the platform's LLM activity. It sits inside oracle, observes every LLM and tool interaction across a three-layer system, distills durable artifacts from those observations, and exposes learned guidance back to the layers that need it.
Apollo is an observer, learner, and advisor. It does not execute workflows, does not call tools, does not retry failed requests, and does not interrupt live LLM calls. Iteration is driven by layer 1 (the front-end prompt/schema generator); Apollo's role is to make each successive iteration better informed than the last.
Apollo has its own LLM, its own memory, and an autonomous curator that maintains its own artifacts — but empowerment is strictly bounded to Apollo's internal state. Apollo cannot change auth, guardrails, token scopes, or user data.
Three-Layer Context
┌──────────────────────────────────────────────────────────────┐
│ Layer 1: Front-end │
│ - generates prompts + schemas for requests │
│ - consumes Apollo's guidance to shape future prompts │
│ - decides when to re-run a workflow │
└────────────────────┬─────────────────────────────────────────┘
│ intent, prompt, schema
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 2: Oracle / Apollo │
│ Oracle: auth, routing, LLM dispatch, tool aggregation │
│ Apollo: observe, learn, advise, curate │
└────────────────────┬─────────────────────────────────────────┘
│ tool calls, sub-LLM calls
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 3: Backend agents + libraries │
│ - agents (parallax, cortex, ...) — LLM-driven │
│ - libraries (UDS, uds.*, ...) — operational, no LLM │
│ - execute domain logic, return outputs │
│ - oracle observes the MCP round-trip and emits on their │
│ behalf (L3 never addresses Apollo directly) │
│ - agents consume injected guidance; libraries do not │
└──────────────────────────────────────────────────────────────┘
Apollo observes the full lineage of each request: intent (layer 1) → routing and LLM reasoning (layer 2) → execution and outcome (layer 3) → final response back to layer 1.
Apollo's reasoning output reaches the LLMs in L1 and L3 by attaching guidance to the existing request/response flow — symmetric piggybacking in both directions:
- L1: oracle attaches current applicable guidance to every
/chatresponse body. - L3: oracle attaches current applicable guidance to every outbound MCP tool dispatch bound for an
agent-kind service.
Both paths ride the ambient auth of the envelope they are embedded in, so no service-token infrastructure is needed and no long-lived connection is maintained. Guidance is always fresh (computed at request time, inside oracle, with Apollo as an in-process call). See §Injection Channel.
Communication with L1 and L3 is unidirectional: oracle attaches Apollo's guidance to outbound envelopes (responses to L1, MCP dispatches to L3); L1 and L3 never query Apollo for guidance and never emit observations to Apollo directly. Oracle is Apollo's sole emitter — it extracts L1 signals from /chat request bodies and it observes L3 outputs by watching the MCP round-trip, calling oracle.apollo.observer.ingest in-process on both layers' behalf (§Invariants 14, §Ingest Semantics). Oracle is also a guidance subscriber for its own chat LLM (L2): the tool-executor at oracle/server/llm/tool_executor.py consults a process-local ApolloGuidanceCache on each turn — no transport involved (§L2 path). Admin tooling is the only exception — admins use GET /api/v1/apollo/guidance and related endpoints for inspection, and may POST to /api/v1/apollo/observations for replay/seed.
Apollo does not observe L1's or L3's internal LLM turns. llm_turn events are emitted by oracle (L2) only. Apollo learns about L1, L2, and L3 LLMs indirectly — from L1/L3 outputs (intent_schema / user_prompt for L1; tool_output / tool_error for L3), from oracle's own llm_turn and final_response (L2), and from outcome correlation — and improves them prospectively by injecting updated guidance into their prompt context.
Trace Propagation
Apollo relies on a single trace_id shared across every observation emitted for one end-to-end request. Trace propagation follows the W3C Trace Context standard (traceparent header) end-to-end across L1, L2, and L3.
This is the concrete realization of the OpenTelemetry aspiration noted in component.oracle.gateway (Oracle) and introduces no conflict with axonis-core: neither axonis-core nor platform.axonis-core/02/03 currently defines a trace, request-id, or correlation-id header. The only header propagation today is Authorization via axonis_core.gateway.client.extract_http_headers() (platform.axonis-core). Apollo's adoption is additive.
Header
traceparent: 00-<trace-id 32 hex>-<parent-span-id 16 hex>-<flags 2 hex>
Format: W3C Trace Context Level 1. Apollo uses only the trace-id segment for lineage stitching; parent-span-id and flags are preserved for standards compliance and future OpenTelemetry interop but are not interpreted by Apollo's lineage layer.
Who mints
- L1 mints the root
traceparenton every new request and sets it on the outbound HTTP call to oracle/chat(and equivalent endpoints). L1 does not call Apollo directly (§Invariants 14); oracle re-emits L1-origin observations in-process, reusing the sametrace-id. - If oracle receives a request without a
traceparentheader (e.g., a pre-W3C client), oracle mints one, logs amissing_traceparenttelemetry event, and surfaces the mintedtrace-idin the response so callers can correlate if they choose.
How it travels
| Hop | Carrier |
|---|---|
| L1 → L2 (HTTP to oracle) | traceparent request header |
| L2 → L3 (MCP tool dispatch) | traceparent HTTP header on the POST to the service's MCP endpoint (same transport as the existing Authorization forward) |
| L2 → L3 (HTTP fallback, non-MCP) | traceparent request header |
| L3 → L2 (MCP response → oracle) | traceparent is preserved by oracle's MCP client across the round-trip; oracle stamps the same trace_id on the tool_output / tool_error envelope it emits in-process |
Admin seed → Apollo (POST /observations) |
traceparent request header and trace_id field in the envelope |
| Out-of-process emitter → Apollo (secondary) | traceparent request header and trace_id field in the envelope (envelope is authoritative) |
Oracle is the only L2 hop and is responsible for forwarding the inbound traceparent unchanged on every downstream call that belongs to the same request. Oracle never re-mints mid-request.
axonis-core integration
Trace header propagation ships as an additive change to axonis-core — it lives with the existing cross-service header plumbing, not in oracle-only code:
axonis_core.gateway.client.extract_http_headers()— extended to forwardtraceparentalongsideAuthorization. This is the single source of truth for cross-service header propagation and is used by bothMCPClientandRestClient.axonis_core.gateway.mcp_client.MCPClient— readstraceparentfrom the inbound request context and sets it as an HTTP header on outbound MCP POSTs, alongside the existingAuthorizationforward.axonis_core.gateway.rest_client.RestClient— readstraceparentfrom the inbound context and sets it as an HTTP header on outbound REST calls.ApolloClient(component.oracle.apollo §Ingest Semantics, in axonis-core) — used by admin replay and any future out-of-process emitter; reads the ambienttraceparentfrom request context and sets it as an HTTP header on everyPOST /api/v1/apollo/observationscall and into the envelope'strace_idfield (the envelope wins on conflict — see §Envelope mapping). Phase-1 emitters do not use this client; oracle emits in-process and carriestrace_idon the envelope it builds directly.
No new dependency is added to axonis-core — parsing the 4-segment traceparent string is a handful of lines; no OpenTelemetry SDK is required. A future OpenTelemetry integration can consume the same header without change.
Envelope mapping
Apollo's observation envelope fields map to W3C Trace Context as follows:
| Envelope field | W3C source | Purpose |
|---|---|---|
trace_id |
traceparent.trace-id (32-hex) |
Shared by all events for one end-to-end request |
parent_trace_id |
not derived from traceparent | Set by emitter only when this trace is a sub-request spawned from a separate enclosing trace (e.g., a scheduled background workflow). Null otherwise. |
parent_trace_id is not the same as W3C parent-span-id. Apollo does not track span hierarchy within a single trace — its per-event observation cadence (§Observation Model → Observation cadence) makes span-level granularity unnecessary. parent_trace_id is used only for cross-trace fork linkage.
Configuration
APOLLO_TRACE_HEADER— header name. Defaulttraceparent(W3C). Configurable only to ease staged rollout against pre-W3C emitters; alwaystraceparentin production.APOLLO_REQUIRE_TRACEPARENT— whentrue, oracle rejects inbound requests without a validtraceparent. Defaultfalsethrough Phases 1–2 (oracle mints on absence). Flip totruein Phase 3 alongsideAPOLLO_REQUIRE_INTENT_SCHEMAonce emitter coverage is proven.
Failure posture
- Missing header (best-effort): oracle mints, logs
missing_traceparent, serves the request. Lineage still stitches because the mintedtrace-idflows downstream and is used by oracle's own observations. - Missing header (required mode): oracle rejects with 400; emitter must include
traceparent. - Malformed header: oracle rejects with 400 in required mode; logs
malformed_traceparentand mints a replacement in best-effort mode. - Envelope
trace_iddiffers from header: the envelope value wins — it is the emitter's authoritative signal. Oracle logs the discrepancy for diagnostics.
Package Structure
oracle/
apollo/
__init__.py
observer/
__init__.py
ingest.py # observation normalization + routing into memory
events.py # event type definitions (Pydantic models)
memory/
__init__.py
store.py # wraps axonis-core Memory UDS + ElasticQuery
learner/
__init__.py
synthesis.py # event-driven LLM synthesis dispatcher (primary driver)
graphs.py # Decision Graph set: nodes, edges, weights, mutations (supplemental anchor)
extractors.py # observation → decision points (deterministic; feeds graphs)
snapshots.py # versioned graph snapshots for past/current temporal analysis
trajectory.py # projection of future decisions from current graph state
drift.py # graph-anchor check on LLM outputs; drift-vs-evolution detection
prompts.py # prompt templates for the synthesis LLM
guidance/
__init__.py
api.py # admin inspection endpoints (GET /guidance*)
attacher.py # in-process helper oracle calls to attach guidance to responses and MCP dispatches
selectors.py # intent → artifacts matching logic
curator/
__init__.py
actions.py # promote / demote / forget / edit / compact
policy.py # bounded-empowerment rules
audit.py # audit log writer (Elastic `apollo_audit` index)
evaluator/
__init__.py
scoring.py # grades artifacts by outcome correlation; L3-performance amplification
signals.py # failure-signal detectors (see §Evaluator)
cascade.py # upstream-artifact re-flag on L3-driven score drops
chat/
__init__.py
server.py # admin-only conversational interface
tools.py # memory-management tools exposed to Apollo's own LLM
artifacts.py # typed artifact schemas (IntentPattern, FailurePattern, ...)
llm.py # Apollo's own LLM client (separate config)
settings.py # env-driven configuration
server/
__main__.py # mounts /api/v1/apollo/* routes from oracle.apollo.guidance + chat
Apollo is a package inside oracle, mounted into oracle's existing Starlette app at /api/v1/apollo/*. It is not a separate service and does not have its own __main__.py. The oracle invariant ("oracle is the only externally exposed service" — component.oracle.gateway §Invariants 1) is preserved.
Observation Model
Event types
Apollo recognizes the following event types, emitted by oracle and backend services:
| Event type | Emitter | Purpose |
|---|---|---|
intent_schema |
Oracle (from L1 /chat request body) |
Front-end's generator schema for this request |
user_prompt |
Oracle (from L1 /chat request body) |
Concrete prompt produced from the intent schema |
llm_turn |
Oracle (layer 2) | One LLM request/response cycle inside oracle |
tool_output |
Oracle (from L3 MCP response) | Successful tool execution: inputs, outputs, latency |
tool_error |
Oracle (from L3 MCP response) | Tool failure: inputs, error message, stack trace, latency |
final_response |
Oracle | What was returned to layer 1 at the end of a conversation turn |
user_feedback |
Oracle (from L1 feedback submission) | Thumbs up/down, correction, explicit follow-up signal |
Emission paths are covered in detail in §Ingest Semantics. In summary: every Phase-1 event (L1-origin, L2-origin, and L3-origin) is emitted by oracle in-process — neither L1 nor L3 addresses Apollo directly (§Invariants 14). POST /api/v1/apollo/observations remains mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.
Observation payload
All observations share a common envelope:
{
"event_type": "tool_output",
"trace_id": "trc_abc123",
"parent_trace_id": "trc_def456",
"conversation_id": "conv_xyz",
"service": "parallax",
"timestamp": "2026-04-17T10:30:00Z",
"caller_identity": {"username": "...", "roles": [...]},
"emitted_by": {"token_subject": "...", "token_roles": [...], "context": "http"},
"payload": { ... event-specific fields ... }
}
caller_identity vs emitted_by. Apollo records two attribution axes per observation:
caller_identity— application-asserted. Who the work is attributed to. Set by the emitter (often a service token stamping observations on behalf of an end user — e.g., cortex emitscaller_identity.username="alice"because alice's/chatrequest fanned out to cortex). The handler stamps this from the Bearer token only when the envelope didn't carry one.emitted_by— server-stamped, unforgeable. Who actually pushed the bytes. Always overwritten by Apollo's ingest handler (HTTP path) or in-process emit helper (oracle hosts Apollo). Carries the validated tokensubject,roles, and acontext("http" or "in_process"). Emitters cannot forge it; the handler ignores any inbound value and stamps fromrequest.state.token_payload.
Audit query: rows where caller_identity.username != emitted_by.token_subject and emitted_by.token_subject is not a known service principal → flag for review. The two-axis model preserves the legitimate cross-attribution pattern (services emitting per-user observations) while making forging detectable.
Observation cadence (locked)
Apollo records one observation per: - Turn boundary (each LLM request/response cycle) - Tool invocation (tool_output or tool_error) - Error - Final response returned to layer 1
Apollo does not record per-token events. Token-level observation is too noisy for the learner and would cause drift in learned artifacts. This is a drift-prevention decision.
Lineage
Every observation carries a trace_id derived from the W3C traceparent header propagated across L1 → L2 → L3 (§Trace Propagation). Related observations (all events from one end-to-end request) share the same trace_id. Cross-trace sub-requests (e.g., scheduled background workflows spawned from a chat turn) use parent_trace_id for hierarchy.
Memory Model
Apollo's memory is two-tiered:
- Raw observations — the events listed above, stored in the Elastic
apollo_observationsindex. High volume, time-boxed retention. - Learned artifacts — structured, versioned objects produced by the Learner. Stored in the Elastic
apollo_artifactsindex. Low volume, long-lived.
Both indices use the Memory(UDS) class from axonis_core.userspace.intelligence as their UDS primitive (platform.axonis-core §Memory Pattern), specialized via subclassing. Apollo does not re-implement the storage surface.
Artifact types
| Artifact | Description |
|---|---|
DecisionGraph |
A specialized graph of decision points and transitions (see §Decision Graphs) |
DecisionTrajectory |
Smoothed trajectory of a graph's evolution over time |
DriftEvent |
Flagged structural shift in a decision graph requiring review or explanation |
IntentPattern |
Recurring front-end intent → successful tool/service routing and output shape |
IntentSchema |
Known layer 1 generator schemas Apollo has learned to recognize |
SchemaDrift |
Layer 1 started emitting a new or changed schema — flagged for admin review |
PromptShape |
Recurring prompt structure correlated with good/bad outcomes |
ToolPairingHint |
"Tool X is usually followed by tool Y in successful runs" |
FailurePattern |
Known failure mode with diagnostic signature and recommended remediation |
ServiceConnectionHint |
"For intent of class Q, service S gives better results than service S'" |
SpecFragment |
Short, targeted spec snippet relevant to a class of intent |
PromptShim |
System-prompt addition that improves outcomes for a class of intent |
CapabilityMap |
Distilled view of which services can satisfy which intents |
Each artifact is a Pydantic model in apollo/artifacts.py backed by a UDS class. Artifacts are versioned — see §Curator.
Index mappings and templates
Every Apollo index is a flat Elasticsearch index (not a data stream, no ILM policy). Mappings are shipped as JSON templates under oracle/apollo/templates/, following the same convention as rest/uds/templates/*_mapping.json:
apollo_observations_mapping.jsonapollo_artifacts_mapping.jsonapollo_artifact_history_mapping.jsonapollo_graph_nodes_mapping.jsonapollo_graph_edges_mapping.jsonapollo_graph_snapshots_mapping.jsonapollo_audit_mapping.json
Every mapping includes the standard UDS block (uds.timestamp, uds.username, uds.visibility), create_ts, update_ts, schema_version, and — for time-limited indices — an expires_ts date field (same pattern as rest/uds/templates/memory_mapping.json). Every index follows the Memory(UDS) / Elastic base-class pattern from platform.axonis-core so that CRUD goes through axonis_core.elastic.Elastic.
Retention
Retention is application-managed, not Elastic-ILM-managed. This matches the codebase convention: axonis-core and rest/uds/ do not configure ILM policies, rollovers, or data streams. Each Apollo document that has a bounded lifetime carries an expires_ts field; a periodic maintenance task (see below) runs Elastic.delete_by_query filtering on expires_ts < now() to reclaim space.
| Class | Index | Expiry mechanism | Retention |
|---|---|---|---|
| Raw observations | apollo_observations |
expires_ts = create_ts + 30d set on write |
30 days |
| Graph snapshots (hot) | apollo_graph_snapshots |
expires_ts set by coarsening task (see below) |
Hourly granularity for 7 days |
| Graph snapshots (warm) | apollo_graph_snapshots |
Daily snapshots retained after coarsening | Daily granularity for 30 days |
| Graph snapshots (cold) | apollo_graph_snapshots |
Weekly snapshots retained after coarsening | Weekly granularity for 90 days total |
| Learned artifacts | apollo_artifacts |
No expires_ts — lifecycle driven by Curator |
Indefinite; forgotten by admin or Evaluator-demoted N cycles |
| Artifact history | apollo_artifact_history |
No expires_ts |
Indefinite (rollback substrate) |
| Audit log | apollo_audit |
expires_ts = create_ts + 90d or null for indefinite |
≥ 90 days (configurable) |
Maintenance task. A periodic background job (default hourly, configurable via APOLLO_MAINTENANCE_INTERVAL) performs:
1. delete_by_query on any index where expires_ts < now()
2. Coarsening on apollo_graph_snapshots: hourly rows older than APOLLO_SNAPSHOT_HOURLY_TO_DAILY_AGE_DAYS (default 7) are grouped by (graph_id, calendar date); the most recent row in each group is re-tagged tier="daily" and the rest deleted. Same shape at the daily→weekly boundary: dailies older than APOLLO_SNAPSHOT_DAILY_TO_WEEKLY_AGE_DAYS (default 30) collapse to one weekly row per (graph_id, ISO week). Both windows are env-overridable; see apollo/settings.py for the documented operator profiles, validation rules, and storage trade-offs.
3. Optional Learner-driven compaction of observations near TTL into apollo_artifacts summaries (event-driven: compaction runs on admin-initiated synthesis or guidance miss, not in this maintenance pass).
The maintenance task uses axonis_core.elastic.Elastic.delete_by_query — no Apollo-specific Elastic client.
Retention summary
- Raw observations: 30 days, then
delete_by_query. - Graph snapshots: 90 days total, tiered (7d hourly → 30d daily → 90d weekly) via application-level coarsening.
- Artifacts: indefinite; Curator manages lifecycle; prior versions preserved in
apollo_artifact_historyforever. - Audit log: ≥ 90 days.
Learner
Apollo's Learner is LLM-driven, graph-anchored. Apollo's LLM (see §Apollo's LLM) is the primary engine of synthesis: it processes observations as they arrive (event-driven — see §LLM synthesis below), creates and refines artifacts, classifies intents, diagnoses outcomes, and drives admin chat. The decision graphs are supplemental — they provide deterministic grounding that keeps the LLM anchored and prevents it from drifting.
The relationship is: the LLM reasons flexibly; the graphs remember rigidly. Every LLM call reads the relevant graph state as grounding context. Every LLM output is checked against the graph's trajectory. The LLM cannot propose a pattern that contradicts what the graphs have deterministically recorded without being flagged as drift.
Decision Graphs
Apollo maintains a series of specialized graphs rather than one monolithic graph. Each graph captures a different decision surface:
| Graph | Nodes | Edges |
|---|---|---|
intent_tool_graph |
Intent classes, tool identifiers | "Intent → tool chosen" with outcome weight |
prompt_shape_graph |
Prompt structure clusters | "Shape A evolved into shape B in later iteration" |
service_routing_graph |
Intent classes, backend services | "Intent → service picked" with outcome weight |
outcome_graph |
Decision points, outcome classes | "Decision → outcome produced" with frequency |
iteration_graph |
States within a layer-1 re-run chain | "Iteration N → Iteration N+1 decision delta" |
Cross-graph links exist where decisions in one graph point to nodes in another (e.g., a tool-selection node in intent_tool_graph links to the outcome node in outcome_graph).
Node and edge model
Each node carries:
- id, graph_id, kind, label
- Occurrence count, first-seen / last-seen timestamps
- Outcome distribution (aggregated from incoming observations)
- Tags for retrieval
Service-namespaced labels
Every label that is naturally service-scoped is prefixed with the
emitting service: <envelope.service>/<label>. Concretely, the
extractors namespace labels for the following node kinds:
| Graph | Kind | Example label |
|---|---|---|
intent_tool_graph |
intent, tool |
cortex/screening, cortex/summarize |
prompt_shape_graph |
prompt_shape |
oracle/shape:20:a3f1b2c0 |
service_routing_graph |
intent |
parallax/screening |
outcome_graph |
decision_point |
cortex/tool:summarize, oracle/conversation:conv_42 |
iteration_graph |
iteration_state |
oracle/iter:trc_1234 |
Two node kinds are intentionally not prefixed:
servicenodes inservice_routing_graphcarry the service name itself as their identity (e.g., barecortex). Prefixing would yield the meaningless labelcortex/cortex.outcomenodes inoutcome_graphcarry universal categorical labels (success,error,feedback_up,feedback_down,feedback_abandoned). The per-service split is carried by thedecision_pointside of the edge, not by fragmenting the outcome taxonomy.
This rule means two backend services that register a tool with the same
name (e.g. cortex/summarize and parallax/summarize) form distinct
nodes and accumulate counts, EWMA weights, and outcome distributions
independently. Downstream synthesis (M8), drift detection (M12), and
evaluator scoring (M10) therefore operate on per-service signal rather
than a cross-service average.
Each edge carries:
- source_id, target_id
- Weight (an outcome-correlation-adjusted transition probability)
- Count, first-seen / last-seen
- Recent-window weight (exponentially-weighted moving average over a short horizon)
- Long-window weight (EWMA over a long horizon)
The divergence between recent-window and long-window weights is the primary drift signal.
Graph updates (per observation, deterministic)
The Learner's extractors run deterministically on every ingested observation:
- Extract decision points (e.g., "intent class", "tool called", "service routed", "outcome class") using rules and lightweight matchers.
- Upsert nodes: create new if absent, increment count and update last-seen if present.
- Upsert edges: create new or reinforce. Update short-window and long-window weights.
- Attach the observation's
trace_idto the affected nodes/edges for lineage queries.
No LLM call. No new free-form artifacts. Graph mutations only. This path is the grounding layer — it records what has actually happened in the system, with no interpretation.
Snapshots and trajectory
- Snapshots. Each graph is snapshotted on a cadence (default: hourly; configurable via
APOLLO_GRAPH_SNAPSHOT_INTERVAL) into the Elasticapollo_graph_snapshotsindex. Snapshots are the substrate for past-vs-current comparison. - Trajectory. A projection of near-future graph state from current EWMA velocities. Used by Guidance to pre-warm likely-next decisions and by drift detection to establish an expected trajectory.
LLM synthesis (event-driven, primary driver)
Apollo's LLM runs the primary synthesis engine and is event-driven, not scheduled. It is invoked in response to specific observation events — not on a timer, not on a batch threshold. The cadence of synthesis matches the cadence of actual system activity.
Synthesis triggers.
| Trigger | Inbound event |
|---|---|
| Layer 1 sends a request | intent_schema or user_prompt observation ingested |
| Layer 3 returns an output | tool_output, tool_error, or final_response ingested |
| Admin chat turn | POST /api/v1/apollo/chat request |
| Admin-initiated synthesis | POST /api/v1/apollo/learn request |
Other observation types (llm_turn from oracle itself) feed the graphs but do not trigger synthesis on their own — they are intermediate steps between a Layer 1 request and a Layer 3 output. Novel-intent synthesis occurs naturally on the Layer 1 / Layer 3 triggers above; no GET /guidance request from L1 or L3 ever drives synthesis (§Invariants 14).
Inputs on each synthesis call.
- The triggering observation (or chat turn)
- The relevant subgraph state from each decision graph (grounding context)
- Active artifacts that match the observation's intent/tool/service fingerprints
- Recent evaluator scores for matched artifacts
- Prior synthesis output for the same trace_id, if any (for continuity within a request lineage)
Outputs.
- Proposed new artifacts (IntentPattern, FailurePattern, etc.)
- Proposed edits to existing artifacts
- Proposed promotions/demotions
- Drift flags when the LLM itself detects divergence
- Compaction proposals for old observations near TTL
- For admin chat and admin-initiated triggers: a direct response returned to the caller
All outputs are structured Pydantic models. The Curator commits them only after the graph-anchor drift check (below) clears.
Concurrency. A burst of Layer 3 tool outputs (e.g., a fusion run with many tool calls) can trigger many near-simultaneous synthesis calls. Apollo bounds concurrent synthesis via APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) with a queue of pending triggers. Duplicate triggers within the same trace_id are coalesced: only the latest observation in a lineage is processed.
Graph-anchor drift check
The graphs are the anti-drift mechanism. Every LLM synthesis output is validated against the graphs before the Curator commits it:
- Proposed pattern vs. recorded edges. If the LLM proposes "tool X is typically followed by tool Y" but the
intent_tool_graphedge X→Y has low weight or is absent, the proposal is flagged. - Proposed intent classification vs. node clusters. If the LLM introduces an intent class that does not correspond to any node cluster in the graphs, flagged.
- Weight swings. If the LLM's proposal would effectively invert a strongly-weighted edge, flagged — even if the LLM's reasoning is plausible, this is exactly the shape of drift.
- Trajectory coherence. If the LLM's proposed trajectory diverges from the graph's EWMA projection, flagged.
Flagged outputs produce a DriftEvent artifact. The Curator does not commit a flagged proposal autonomously — admin review is required via chat or the audit surface. This is how the graphs protect the LLM from itself.
Drift vs. evolution
The graph-anchor check distinguishes:
- Evolution — LLM synthesis outputs consistent with graph trajectory; graph weights shift smoothly as observations accumulate. Proposals are committed autonomously by the Curator.
- Drift — LLM synthesis outputs diverge from graph state; sudden edge-weight swings; emergent nodes appearing faster than configured rate caps. Proposals are held for admin review.
Thresholds are per-graph and configurable (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance on LLM outputs).
Storage
- Graph nodes and edges live in the Elastic
apollo_graph_nodesandapollo_graph_edgesindices (UDS-backed, per platform.axonis-core invariant 2). - A working in-memory mirror of the active graphs is maintained for hot-path reads (guidance, drift detection). The in-memory mirror is derived state; it is always rebuildable from Elastic.
- Snapshots live in
apollo_graph_snapshots. Snapshots are immutable after write.
Guidance API (admin inspection only)
Apollo delivers guidance to L1 and L3 LLMs exclusively via the response-attached Injection Channel (see §Injection Channel). L1 and L3 do not pull guidance — there are no GET calls from those layers in the runtime path.
The GET /guidance* endpoints below are retained as admin inspection tools only: admins and admin chat tooling use them to preview what Apollo would currently inject for a given intent, layer, or subscriber. They are gated to role admin via oracle's guardrails (component.oracle.gateway §Guardrails).
L3 operational libraries (no LLM) receive no guidance — they emit observations and are otherwise opaque to Apollo.
Endpoints
GET /api/v1/apollo/guidance?intent=<query>&layer=1|3
GET /api/v1/apollo/guidance/schemas
GET /api/v1/apollo/guidance/tools
GET /api/v1/apollo/guidance/specs
GET /api/v1/apollo/guidance/connections
The top-level /guidance endpoint accepts an intent description (free text or structured) and the consuming layer, and returns a ranked set of applicable artifacts — previewing what Apollo would currently inject. The sub-paths return filtered artifact views by type.
All endpoints require the admin role. L1 and L3 never call them (§Invariants 14).
Example response
{
"intent_match": {"pattern_id": "ipat_abc", "score": 0.88},
"schemas": [...],
"tools": [
{"name": "fusion_run_start", "description_override": "...", "routing_hint": "parallax"}
],
"specs": [
{"id": "spec_frag_123", "content": "For federate alignment, ensure lens binding..."}
],
"connections": [
{"from": "layer1.screening_intent", "to": "parallax.fusion", "confidence": 0.91}
]
}
Workflow Generation Hints
Oracle's gateway owns the natural-language→workflow-graph orchestration contract (component.oracle.gateway §workflow-generation). Apollo is the enhanced-generation and guidance half: it shapes what gets generated and surfaces quality hints about a workflow, using its observation and learning layers. Apollo never executes operations and never authors the workflow itself — it injects guidance into the generation path and annotates workflows with advisory hints.
Generation Guidance
When the gateway drives a workflow-generation request, Apollo contributes guidance through the same response-attached Injection Channel it uses for all L1/L3 guidance (§Injection Channel) — it is not a separate pull path.
- #REQ.workflow-gen-guidance — for a workflow-generation intent, Apollo may attach matched guidance artifacts (intent→operation patterns, tool routing hints, learned successful-workflow exemplars) to the generation request, raising the likelihood that the produced node graph is valid and idiomatic. This is advisory: the gateway's generation contract (component.oracle.gateway §workflow-generation.contract) remains the source of truth for request/response shape.
- #REQ.workflow-gen-learning — Apollo observes generated-workflow outcomes (accepted, edited, discarded, execution success/failure) as observations and feeds them to the learner, so generation guidance improves over time. Apollo does not persist per-call generation state beyond its standard observation model.
Workflow Quality Hints
Apollo analyses a (generated or user-built) modelling workflow and emits advisory hints that a frontend can attach to the workflow — flagging issues without blocking the user.
- #REQ.workflow-hints — Apollo's workflow analysis produces hints in three categories: missing best practice, data quality issue, and modeling issue. Each hint is advisory (non-blocking), carries the node/edge it applies to, and a human-readable rationale.
- #REQ.workflow-hints-scope — Apollo's hint scope is the modelling-workflow layer (operation ordering, modelling-step soundness, best-practice gaps). Dataset-level quality analysis routines — computing dataset-quality metrics themselves — are owned elsewhere on the ML surface, not by Apollo; Apollo consumes their signals to phrase a hint but does not implement the dataset-analysis routines.
Injection Channel
Apollo delivers guidance to L1 and L3 LLMs by attaching it to the existing request/response flow — symmetric piggybacking in both directions. There is no separate push transport, no long-lived connection, no service token, no SSE client in production. Guidance is computed at request time (in-process inside oracle, since Apollo lives there) and embedded in the envelope that was already travelling.
- L1 path: oracle attaches current applicable guidance to every
/chatresponse body. - L3 path: oracle attaches current applicable guidance to every outbound MCP tool dispatch.
Both paths are fresh-per-call by construction — there is no cache to go stale, no reconnect to replay, no disconnected subscriber to reconcile. Apollo lives inside oracle, so fetching guidance for an outbound envelope is an in-process Python call, not a network hop.
Guidance communication is unidirectional: Apollo → L1, Apollo → L3. Subscribers never POST guidance back (observation ingest is a separate path — §Ingest Semantics). Captured as §Invariants 14.
Why response-attached instead of a push channel
L1 is only doing LLM work when composing a response to the user's latest message — the act of calling /chat is what triggers that work. Any guidance change Apollo makes while L1 is idle has nothing to apply to until the next /chat, at which point the response can carry the freshest state. A separate push channel for idle L1 therefore provides no observable benefit and introduces a long-lived auth session to maintain.
L3 agents only exist inside a user-request context (oracle dispatches to them; they validate the forwarded user token). There is no service-token mechanism in axonis-core today (see §Authentication & Authorization). A long-lived L3 connection would require inventing one. Attaching guidance to the MCP dispatch uses the existing user-token-forwarding pattern and delivers guidance exactly when the agent needs it.
L1 path: attached to /chat responses
POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop (oracle/server/llm/tool_executor.py, 5 providers: anthropic / openai / groq / ollama / trinity). It is distinct from Apollo's admin chat at POST /api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for talking to Apollo's synthesis brain.
When oracle responds to a POST /chat, it calls Apollo's in-process apollo.guidance.for_l1(user=..., intent_context=...) before serializing, and embeds the result on the response envelope under apollo_guidance. Beacon-style L1 clients consume that field via their local ApolloGuidanceCache.update(...).
Model extension. Oracle's existing ChatResponse Pydantic model (oracle/server/api/routes.py) must be extended with an optional field:
class ChatResponse(BaseModel):
response: str
conversation_id: str
tool_calls: list = Field(default_factory=list)
model_used: str = ""
tokens: dict = Field(default_factory=dict)
apollo_guidance: dict | None = Field(default=None) # added by component.oracle.apollo
The field defaults to None, so pre-Apollo clients and responses where guidance is omitted (attach-timeout, Apollo unavailable, empty applicable set) serialize identically to today. Clients that don't know about the field simply ignore it.
Envelope shape when guidance is present:
{
"response": "...assistant reply...",
"conversation_id": "...",
"tool_calls": [...],
"model_used": "...",
"tokens": {...},
"apollo_guidance": {
"as_of": "2026-04-17T10:30:00Z",
"artifacts": [
{
"id": "pshim_xyz",
"type": "PromptShim",
"version": 7,
"content": { ... },
"applicability": { "intent_class": "...", "tags": [...] },
"rationale": "Human-readable explanation of why this artifact is active now."
}
],
"rationale_summary": "3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
}
}
L1 receives the response, hands apollo_guidance.artifacts to its local ApolloGuidanceCache, and renders the assistant message. The payload is the complete applicable set for this user's L1 scope — not a diff — so cache replacement is strictly idempotent.
On the next user turn, L1 uses the freshly-populated cache to compose its prompt. Guidance staleness is bounded by a single turn.
L2 path: in-process cache for oracle's own chat LLM
Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Oracle is therefore also a guidance subscriber for its own LLM — distinct from L1 (beacon's LLM) and L3 (cortex/parallax's LLMs).
Because oracle hosts Apollo, no transport is needed. Oracle owns a process-local ApolloGuidanceCache populated directly from apollo.guidance.for_l2(...) (analogous to for_l1 and for_l3_agent) before each LLM turn. The tool-executor consults the cache via the canonical accessors (get_system_prompt_additions, get_spec_fragments, get_active_failure_patterns, get_tool_pairing_hints, get_tool_description_overrides, get_service_connection_hints) on every turn and folds the results into its system prompt and tool-catalog rendering, exactly as L1 and L3 subscribers do.
The L2 path is symmetric with L1/L3 in artifact applicability filtering (scope=l2 on the attacher), in the timeout budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS), and in the failure posture (cache miss / timeout → tool-executor proceeds with no guidance, request still succeeds). It differs in transport only: no JSON serialisation, no envelope traversal — a direct in-process call.
L3 path: attached to MCP tool dispatches
When oracle dispatches a tool call to an L3 agent (component_kind == "agent"), oracle attaches Apollo's currently-applicable guidance inside the tool's arguments dict under the apollo_guidance key — mirroring the existing pattern oracle uses to inject llm_spec into arguments (oracle/server/mcp/server.py). This keeps the JSON-RPC envelope shape unchanged (params stays {name, arguments}) and requires no MCP handler changes on agent-side beyond the agent extracting and applying the new argument:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "fusion_run_start",
"arguments": {
"...tool-specific args...": "...",
"apollo_guidance": {
"as_of": "2026-04-17T10:30:00Z",
"artifacts": [ ... ],
"rationale_summary": "..."
}
}
}
}
L3 agent-side MCP handlers extract apollo_guidance from arguments (the same way they currently extract llm_spec), hand it to their local ApolloGuidanceCache for the duration of this request's LLM turns, and strip it before passing the remaining arguments to the tool's business logic. Because L3 only acts inside a user-request context, cache lifetime naturally scopes to the request — no background state, no long-lived connection, no service-token novelty.
L3 libraries (component_kind == "library") do not receive apollo_guidance in their dispatches — oracle filters them out before serialization. Libraries have no LLM to improve (§Invariants 15).
Payload shape
apollo_guidance carries:
as_of— timestamp of the artifact snapshot. Used for traceability and admin debugging.artifacts— the currently-applicable artifact set for the subscriber's scope. Each artifact hasid,type,version,content,applicability, andrationale(see §Rationale and evidence).rationale_summary— structured one-liner naming the attached artifact IDs per type, plus a+N capped (...)tail for artifacts the per-type cap held back. See §Prioritization Layers → Layer 5 for the exact format and the parallelaggregate_artifact_statsquery for per-artifact stats.
There is no injection_id, no reason/trigger enum, no subscriber_scope, no evidence_ref on the per-call payload. That metadata lives in the audit log (§Audit log) — attaching it to every response/dispatch would balloon payload size with data that matters to admins, not to LLMs.
Freshness and ordering
Guidance is always at most one turn stale from each subscriber's perspective:
- L1's next
/chatcall sees the freshest guidance. Between turns, L1's cache reflects the guidance as-of the most recent response. - L3's MCP dispatch carries guidance computed at the instant oracle is about to call. By construction the agent sees guidance current at dispatch time.
Because the cache is overwritten on every inbound response/dispatch, there is no "subscriber drift" problem to solve — the cache cannot diverge from Apollo.
Triggers (synthesis unchanged)
Apollo's Curator still commits artifact mutations event-driven (§Learner → LLM synthesis). The commits no longer trigger separate push events — they simply become the state that the next attached apollo_guidance payload reflects. Pause/resume of the Curator is therefore also a passive effect: paused Curator → artifact set stops changing → subscribers keep receiving the same state on subsequent calls.
Failure posture
- Apollo slow: oracle's guidance-fetch call has a strict in-process budget (
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS, default 10 ms). On overshoot, oracle serializes the response/dispatch withoutapollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. No user-visible failure. - Apollo has no applicable guidance:
apollo_guidanceis omitted (ornull). Subscribers proceed without guidance. Normal state during Phase 1. - Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation, demotes; the next attached payload reflects the demotion. Admin can force rollback at any time.
- No network partition risk: Apollo is in-process with oracle. There is no network path between them that can fail.
- Curator paused: attached payloads continue to reflect the state as-of the pause. Subscribers see frozen guidance until resume. Because every response/dispatch still carries the current set, subscribers never lose their guidance due to the pause — it just stops changing.
Rationale and evidence
Each artifact in the attached payload carries a rationale string (LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions). This is the same rationale written into apollo_audit (§Audit log). Subscribers may log it when applying the artifact to a prompt; admins query it via audit log or admin chat (§Admin Chat).
The fuller evidence_ref (pointers to observations, graph snapshot id, score decomposition, related drift events) is not carried in the per-call payload — it lives in apollo_audit. Admins retrieve it via explain_decision / discuss_decision in admin chat, which resolves the audit record.
Audit
Every Curator action writes an apollo_audit record with action, actor, trigger, rationale, and evidence_ref (§Audit log). Individual deliveries — attached payloads on responses and dispatches — are not audited. Delivery would produce one record per user turn per layer, far too noisy to be useful. The audit captures decisions; deliveries are implementation detail.
Subscriber SDK: ApolloGuidanceCache (pure local cache)
ApolloGuidanceCache in axonis-core is a pure in-memory cache with no transport. It has two surfaces:
Update (called by the subscriber's request handler):
cache.update(apollo_guidance_block)— replaces the cache's artifact set with the payload. Idempotent; the payload is the complete applicable set, not a diff.
Canonical accessors (consumed by the subscriber's LLM-turn codepath):
| Method | Returns | Used at |
|---|---|---|
get_system_prompt_additions(intent_context) |
Ordered list of PromptShim bodies |
System-prompt construction |
get_spec_fragments(intent_context) |
List of SpecFragment |
RAG-like context insertion |
get_tool_description_overrides(tool_name) |
Override dict or None |
Tool-catalog rendering |
get_tool_pairing_hints(current_tool) |
List of ToolPairingHint |
After-tool-call reasoning |
get_active_failure_patterns(intent_context) |
List of FailurePattern with diagnostic hints |
Pre-call guard; post-call error interpretation |
get_service_connection_hints(intent_context) |
List of ServiceConnectionHint |
Service routing |
Applicability filtering happens inside the cache: each artifact's applicability block is matched against the caller's intent_context. When multiple artifacts of the same type match, the SDK returns them ordered by (weight desc, recency desc); merge policy past ordering is the agent's choice.
No HTTP client, no long-lived connection, no authentication inside the SDK — the cache is a data structure inside the subscriber's process. platform.axonis-core invariant 1 (axonis-core has no ML dependencies) is preserved; ApolloGuidanceCache is pure Python data structures.
Empty-cache fallback: if no apollo_guidance has yet been delivered to this subscriber (first call, Apollo off, APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS overshoot), all accessors return empty lists / None. The subscriber proceeds without guidance. This is the safe default pre-Apollo behavior.
Admin inspection
Admins can preview what Apollo would attach on the next request:
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/apollo/guidance?scope=l1 |
Preview current L1-scoped artifact set |
| GET | /api/v1/apollo/guidance?scope=l3:<service_name> |
Preview current L3-scoped artifact set for a given agent |
| GET | /api/v1/apollo/guidance/stream?scope=<scope> |
Admin-only SSE feed of Curator commits in real time (debugging) |
The SSE feed is a debugging aid only — production delivery never uses it. All admin inspection endpoints require the admin role.
Prioritization Layers
The attacher's job is not only to find applicable artifacts but to choose which subset reaches the receiver's LLM. A naive "match everything, send everything" strategy is wrong on two axes: it bloats the receiver's prompt budget once the artifact set grows, and it makes operator-promoted "preferred" artifacts indistinguishable from low-value ones. Apollo's prioritization story is implemented as seven cooperating layers; together they make selection observable, quality-aware, and bounded.
The layers are ordered by data dependency — earlier layers don't depend on later ones, and each layer's surface stays useful even if the layers above it are disabled.
Layer 1 — Capped-artifact observability
When the per-type attach cap drops an artifact (see Layer 2 for the cap mechanism itself), each held-back artifact gets a row in apollo_lineage_events with kind: "capped", the artifact's artifact_type, the call's scope and trace_id. Two query paths read these rows:
query_capped_for_artifact(artifact_id, *, service_name=None, limit=500)— list traces where this artifact was capped.aggregate_artifact_stats(artifact_id, *, since=None, limit=1000)—{attached_count, capped_count, last_attached_at, last_capped_at}.
Both surfaces are exposed on the admin REST API as GET /lineage/capped and GET /artifacts/{artifact_id}/stats. The same lineage rows are also available to the evaluator for "matched-but-shadowed" diagnostics.
Invariant. query_traces_with_artifact and query_trace_attribution filter kind: "capped" out by default — the "applied" semantics of /lineage is unchanged.
Layer 2 — Selection sort key
apollo.guidance.attacher._sort_key orders matched artifacts before the cap fires. Each tier has a default that preserves the previous tier's behavior, so the chain stays well-defined even with sparse data:
| Tier | Source | Default when missing |
|---|---|---|
| 1 | content.evaluator_score |
1.0 (innocent until signaled) |
| 2 | content.confidence |
0.0 (no opinion stated) |
| 3 | applicability specificity (count of populated narrowing fields) | 0 |
| 4 | content.weight |
1.0 |
| 5 | as_of |
"" |
evaluator_score defaults to 1.0 to match ArtifactScore.score's baseline (a never-signaled artifact is treated as innocent). confidence defaults to 0.0 because synthesis confidence is an opt-in endorsement — absence means "no opinion." Specificity activates today and is the practical lever when the upper tiers tie; tiers 1 and 2 become load-bearing once their sources flow (see Layers 4-A and 4-B).
Per-type caps live in config:
APOLLO_ATTACH_CAP_PROMPT_SHIM=10
APOLLO_ATTACH_CAP_SPEC_FRAGMENT=5
APOLLO_ATTACH_CAP_TOOL_PAIRING_HINT=5
APOLLO_ATTACH_CAP_FAILURE_PATTERN=10
APOLLO_ATTACH_CAP_SERVICE_CONNECTION_HINT=5
APOLLO_ATTACH_CAP_INTENT_PATTERN=5
ApolloGuidanceCache._sorted on the receiver side uses an identical priority key so the order the sender selected is preserved through to the LLM.
Layer 3 — Signal preservation at promote
The promote action's content-extraction helper (_content_from_proposal) strips proposal metadata before storing on the artifact. The three ranking signals (evaluator_score, confidence, weight) must not be added to the metadata strip-list. The constants _METADATA_KEYS and _RANKING_SIGNALS in apollo/curator/actions.py make this contract explicit; TestRankingSignalContract enforces it.
Invariant. If a proposal carries evaluator_score, confidence, or weight at the top level, the promoted artifact's content must carry them too.
Layer 4-A — Evaluator score writeback
apollo/evaluator/persist.py:persist_score_to_artifact writes content.evaluator_score and content.score_decomposition to the artifact document after every signal application in the ingest worker. Uses a Painless script to preserve the type-specific content fields (text, signature, etc.).
Properties:
- Fire-and-forget. Never blocks the ingest hot path.
- Idempotent. retry_on_conflict=3 handles concurrent worker writes.
- Kill-switch. APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false disables persistence without touching the in-memory engine (audit + cascade paths still work).
- Graceful degradation. Failures are logged and counted (apollo_evaluator_score_persist_failed_total); the in-memory engine remains authoritative.
Layer 4-B — Synthesis confidence
Every synthesis prompt (build_failure_pattern_prompt, build_intent_pattern_prompt, build_prompt_shim_prompt, build_sweep_prompt) requires the LLM to emit a top-level confidence: 0.0..1.0. The _SHARED_RULES block explains the semantics — reserve high confidence for patterns the model would stake its reputation on, because Apollo uses it to rank artifacts at attach time.
apollo/learner/synthesis.py:_normalize_confidence is called from _record_proposal and:
- Clamps values to [0.0, 1.0].
- Coerces missing or unparseable inputs to _NEUTRAL_CONFIDENCE = 0.5 so a malformed LLM response doesn't unfairly downrank an otherwise-valid proposal.
The normalized value rides on the proposal through promote (via the Layer 3 contract) onto artifact.content.confidence, where Layer 2's sort consumes it.
Layer 5 — Deepened rationale_summary + per-artifact aggregation
apollo.guidance.attacher._summarize emits a structured summary that names attached and capped artifact IDs per type:
"3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
ID lists truncate to _SUMMARY_ID_PREVIEW = 5 with a +N tail. Empty input still produces "". Types are sorted alphabetically so summaries diff cleanly across calls.
aggregate_artifact_stats (Layer 1) is the symmetric on-demand summary keyed by artifact rather than by attach call.
Layer 6-A — Artifact embedding at promote
apollo/learner/similarity.py:compute_embedding reuses axonis.memory.embedder.embed (sentence-transformers, gated by the [memory] extra). The vector is stored on content.embedding_vector. Type-aware text extraction handles each artifact type's content shape (PromptShim text, FailurePattern signature+remediation, etc.).
Graceful degradation. When sentence-transformers is unavailable, compute_embedding returns None; the promote still succeeds with no embedding stored. Downstream similarity checks (6-B, 6-C) skip artifacts without embeddings.
Layer 6-B — Promote-time similarity advisory
After the embedding is computed, the promote handler scans active artifacts at the same (type, service_name, tool_name) scope and surfaces matches above the cosine threshold in ActionResult.similar_artifacts. Default threshold: APOLLO_SIMILARITY_THRESHOLD=0.9.
The advisory is informational only — promote still succeeds. Admin chooses whether to demote + supersede the prior(s) by re-promoting with supersede: true and the prior's IDs.
Layer 6-C — Curator-time similarity sweep
apollo/learner/coalescer.py:run_periodic is a fifth background loop alongside snapshot, curator-auto, maintenance, and synthesis-sweep. Each tick:
- Loads all
status=activeartifacts. - Partitions by
(type, service_name, tool_name). - Within each partition, union-finds clusters where every pairwise cosine ≥
APOLLO_COALESCER_THRESHOLD(default0.85, slightly looser than 6-B's promote-time threshold). - For each cluster, calls Apollo's LLM via
build_coalesce_promptto write a coherent merger. - Records the merger as a proposal on
apollo_proposalswithsupersedes: [id1, id2, ...]so admin promote demotes the components atomically.
Bounded per sweep: APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN=5 (defensive LLM-cost cap). Off by default (APOLLO_COALESCER_ENABLED=false) — operators opt in once they're ready to budget the LLM calls and review the proposals.
promote() extends the supersede flag's semantics: when the proposal carries supersedes: [...], each listed artifact is demoted alongside the new promote, in the same atomic batch.
Metrics surface
Each layer adds telemetry so operators can see what's happening:
| Metric | Source layer |
|---|---|
apollo_guidance_attach_null_total{scope, reason} |
observability over the attach path's null returns |
apollo_guidance_attach_success_total{scope} |
counterpart counter for successful attaches |
apollo_guidance_attach_payload_bytes{scope} (histogram) |
size growth — operators alert if it bloats |
apollo_guidance_attach_artifact_count{scope} (histogram) |
distribution of artifacts per attach |
apollo_guidance_attach_capped_total{scope, artifact_type} |
per-type drop counts (Layer 2 → 1) |
apollo_evaluator_score_persisted_total / apollo_evaluator_score_persist_failed_total |
Layer 4-A health |
apollo_coalescer_proposals_emitted_total / apollo_coalescer_merge_failed_total |
Layer 6-C health |
A guidance_health block on GET /stats summarizes per-scope success/null breakdown for at-a-glance review.
Disabling layers
Every layer can be turned off independently:
APOLLO_GUIDANCE_ATTACH_ENABLED=false # disables Layer 2 + everything above
APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false # Layer 4-A
APOLLO_SIMILARITY_ENABLED=false # Layer 6-A + 6-B
APOLLO_COALESCER_ENABLED=false # Layer 6-C (default off)
APOLLO_LINEAGE_PERSIST_ENABLED=false # Layer 1
When disabled, the layer degrades to no-op; the rest of the system keeps running with the next-best signal.
Curator
The Curator is the only component empowered to mutate Apollo's memory. All mutations are bounded and auditable.
Allowed autonomous actions
- Promote an artifact (increase its weight in guidance results)
- Demote an artifact (hide from guidance without deleting)
- Forget an artifact (delete after it has been demoted for N evaluation cycles)
- Edit artifact metadata (tags, applicability, version, human-readable notes)
- Summarize / compact raw observations into a new artifact
Disallowed actions (hard invariants)
- Change auth or guardrails configuration
- Widen or narrow a caller's tool access
- Read or mutate another user's conversation data
- Mint tokens, escalate privileges, or bypass OAuth
- Call backend services on behalf of any user
- Modify or delete audit log records
Versioning
Apollo uses a two-tier versioning model. Artifacts are versioned per-mutation; graphs are captured via snapshots (see §Snapshots and trajectory and §Retention). Both are in place from Phase 1 — versioning is cheap to establish up front and impossible to reconstruct retroactively once Curator empowerment goes live in Phase 3.
Artifacts (IntentPattern, FailurePattern, PromptShim, SpecFragment, ToolPairingHint, ServiceConnectionHint, CapabilityMap, DecisionTrajectory, DriftEvent, IntentSchema, SchemaDrift, PromptShape). Every mutation — autonomous Curator action, admin edit, synthesis-proposed edit, rollback — produces a new version:
- Current version lives in
apollo_artifacts. - Every prior version is copied to
apollo_artifact_historybefore the mutation. - Each artifact record carries
version,prev_version_id,change_reason,actor. apollo_artifact_historyhas noexpires_ts— prior versions are retained indefinitely as the rollback substrate (§Retention).- Rollback:
POST /api/v1/apollo/artifacts/{id}/rollbackwith targetversionorprev_version_idreplaces the current record and writes a new version whoseprev_version_idpoints at the post-rollback state (so rollback itself is a versioned event, recorded in audit).
Graphs (DecisionGraph). Per-observation node/edge mutations are too high-frequency to version individually. Graph rollback uses snapshots instead:
- Hourly snapshots for 7 days, daily for 30 days, weekly for 90 days (per §Retention).
- Admin rollback on a graph restores from a prior snapshot. Coarser granularity than artifact rollback by design.
- Structural mutations initiated by admin or Curator on a graph (e.g., manually forgetting a node, merging two nodes) are tracked as audit events in
apollo_auditwith before/after snapshot IDs.
Audit log
Every Curator action, Evaluator-driven demotion, drift-hold, upstream artifact re-flag, and admin-chat state mutation writes a record to the Elastic apollo_audit index. The index follows the shared axonis Elastic convention (flat index, UDS shell, expires_ts, delete_by_query cleanup — see §Index mappings and templates and §Retention).
Record schema:
{
"uds": {"timestamp": "...", "username": "...", "visibility": "..."},
"create_ts": "...",
"update_ts": "...",
"schema_version": 1,
"expires_ts": "...", // null if indefinite=true
"action": "promote" | "demote" | "forget" | "edit" | "rollback" | "compact"
| "drift_hold" | "upstream_flag" | "pause_curator" | "resume_curator",
"actor": "curator_auto" | "evaluator_auto" | "admin:<username>",
"trigger": "evaluator_score_below_threshold"
| "l3_performance_cascade"
| "drift_event"
| "admin_manual"
| "synthesis_proposal",
"artifact_id": "...",
"artifact_type": "FailurePattern",
"before_version_id": "...",
"after_version_id": "...",
"related_drift_event_id": "...", // if trigger = drift_event
"evaluator_score": 0.21,
"score_decomposition": { // per §Evaluator outputs
"l3_error": 0.45,
"l3_schema_mismatch": 0.12,
"user_feedback": 0.00,
"evaluator_confidence": 0.08
},
"upstream_artifact_ids": ["ipat_abc", "pshim_xyz"], // flagged artifacts, if cascade
"rationale": "...", // REQUIRED human-readable explanation of WHY this action was taken.
// LLM-synthesized for LLM-driven actions (synthesis proposals, drift flags).
// Templated for deterministic actions (Evaluator-driven demotions —
// composed from score_decomposition and trigger in prose form).
// Always present on every audit record. Distinct from admin_note below.
"evidence_ref": { // pointers to the underlying data the action drew on
"observations": ["obs_...", "obs_..."],
"graph_snapshot_id": "gs_...",
"related_audit_ids": ["audit_..."]
},
"indefinite": false, // set true for critical admin actions
"admin_note": "..." // OPTIONAL admin-supplied justification; separate from rationale
}
Rationale vs. admin_note. rationale is Apollo's own account of why it acted — always present, always auto-generated. admin_note is the admin's own commentary when they take an action — optional, human-supplied. Both are preserved and queryable.
Retention. Default 90 days (configurable via APOLLO_AUDIT_RETENTION_DAYS), enforced by the maintenance task's delete_by_query on expires_ts. Records marked indefinite: true have a null expires_ts and are never deleted — used for critical admin actions (forget of an artifact, pause/resume of Curator, rollback of a versioned artifact). The admin API allows setting indefinite when taking such actions.
Queryable. GET /api/v1/apollo/audit supports filters on time range, action, actor, artifact id, artifact type, trigger, and score-decomposition terms (e.g., "all demotions triggered primarily by L3 errors last 7 days"). Score decompositions let admins see why a score moved without re-deriving it from observations.
Evaluator
The Evaluator scores artifacts based on outcome correlation: after an artifact is published to guidance, do subsequent traces that used it produce better outcomes than traces that did not?
Inputs
- Raw observations (trace outcomes)
- Artifact usage records (which artifacts were returned in guidance, which were incorporated)
- Explicit feedback signals
Failure signals (feeds the evaluator)
An event is considered a failure (negative signal for any artifact associated with its trace) if any of the following:
- Layer 3 returned an error (HTTP 5xx or tool exception) — Layer 3 performance signal. Applies to both agent and library observations. Under oracle-sole-observer (§Invariants 14), the observation is emitted by oracle; the signal keys on the envelope's
servicefield (the observed L3 target), not on who performed the HTTP POST. - Output schema mismatched the Layer 1 intent schema — Layer 3 performance signal. Applies only when the observed L3 service is an agent (
component_kind == "agent"on itsServiceRegistryrecord). Libraries have no agent-level intent contract; their outputs are raw CRUD/compute results and schema mismatch is not evaluated for them. The Evaluator looks upcomponent_kindby the envelope'sservicefield at signal-application time — oracle is always the actual emitter, but the service it observed is what the contract keys on. - User feedback was negative (thumbs-down, correction, abandoned conversation)
- Self-assessed evaluator confidence was below threshold
All four feed the Evaluator; signal 2 is gated on component_kind per the above. Weights are configurable via APOLLO_EVALUATOR_WEIGHTS.
Layer 3 performance carries amplified penalty
Signals 1 and 2 both reflect Layer 3 performance — what the backend services actually produced when acting on Apollo's guidance. If Layer 3 components are not performing well, that is a strong indication that the workflow generation (Layer 1 prompts) and the artifacts driving that generation need to be updated.
Accordingly, the Evaluator applies an amplified penalty to Layer 3 performance failures:
- Default weight tiers:
L3_performance: 3.0,user_feedback: 1.5,evaluator_confidence: 0.5. - Sustained L3 underperformance against a given artifact accelerates the Curator lifecycle:
- Normal demotion cycle requires N=5 below-threshold evaluation cycles before forget.
- L3-driven demotion triggers after N=2 cycles when signals 1 or 2 dominate the score. Rationale: if services are reliably failing on an artifact's guidance, waiting out a long demotion window lets bad guidance keep shaping traffic.
- When an artifact's score degradation is attributable primarily to Layer 3 signals, the Evaluator additionally flags the upstream artifacts — the
IntentPattern,PromptShim, orSpecFragmentthat shaped the Layer 1 prompt which in turn produced the Layer 3 call — for LLM review on the next synthesis trigger. The synthesis LLM may propose edits to those upstream artifacts, creating a cross-layer correction cycle. - Repeated L3 failures on the same artifact within a short window escalate to a
DriftEvent(not just a score drop), forcing admin review rather than silent demotion.
Weights and thresholds are tunable via env vars (APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N).
Outputs
Per-artifact rolling score (exponential moving average). Scores feed the Curator's demote/forget policies. Scores are visible in admin stats and in the audit log when they trigger actions. Score decompositions (per-signal contributions) are retained so admins can see why a score moved — Layer 3 errors vs. user feedback vs. schema mismatch are distinguishable in the audit trail.
Admin Chat
A conversational interface to Apollo, gated by role admin via oracle's existing guardrails (component.oracle.gateway §Guardrails).
POST /api/v1/apollo/chat
Request body mirrors oracle's /chat:
{
"message": "Forget everything Apollo learned about cohort X last week",
"conversation_id": "apollo_admin_sess_...",
"model": "default"
}
The admin chat uses Apollo's own LLM (separate from oracle's primary LLM) with a set of memory-management tools:
list_memories(filter)get_memory(id)forget_memory(id)promote_artifact(id)/demote_artifact(id)rollback_artifact(id, to_version)rollback_graph(graph_id, to_snapshot)trigger_synthesis(trace_id?)explain_decision(trace_id | artifact_id | audit_id)— returns therationale+evidence_reffor a Curator actionlist_decisions(artifact_id?, since?, trigger?)— audit-filtered view of recent Curator actionsdiscuss_decision(artifact_id | audit_id)— opens a focused conversation thread: Apollo's LLM replies with the stored rationale, walks through the evidence (graph snapshot, score decomposition, upstream artifacts), and answers admin follow-ups. The admin can invoke promote/demote/rollback/forget tools inline in the same thread to act on the finding.pause_curator()/resume_curator()
Admin ↔ Apollo conversation
Every Curator action carries a rationale written by Apollo at commit time and persisted in apollo_audit. Admin chat is the surface where those rationales become conversational: an admin asks "why did you just demote pshim_xyz?", Apollo's LLM retrieves the relevant audit record, reads out the rationale and evidence, and answers follow-up questions by re-reading the underlying observations and graph state.
This means admin chat is not just a command console — it is the review surface for Apollo's own findings. Admins can probe rationales, challenge them, and issue corrections (rollback, forget, edit, pause) without leaving the conversation. Every follow-up action is itself audited with actor: "admin:<username>" and a fresh rationale — so the admin's chain of reasoning is preserved in the audit log alongside Apollo's.
All admin-chat actions are logged to the audit index with actor: "admin:<username>".
Non-admin users cannot reach /chat. Their interaction with Apollo is purely transitive, through oracle.
Endpoints
REST (mounted under oracle's /api/v1/apollo/)
| Method | Path | Who | Purpose |
|---|---|---|---|
| POST | /api/v1/apollo/observations |
Admin + out-of-process services | Admin replay/seed, plus the fallback ingest path for services outside oracle's MCP dispatch reach. Phase-1 emitters (oracle + cortex) do not use this endpoint — oracle emits on their behalf in-process (§Ingest Semantics). |
| GET | /api/v1/apollo/guidance?scope=l1 |
Admin | Preview current L1-scoped artifact set |
| GET | /api/v1/apollo/guidance?scope=l3:<service> |
Admin | Preview current L3-scoped artifact set for an agent |
| GET | /api/v1/apollo/guidance/schemas |
Admin | Inspect learned intent schemas |
| GET | /api/v1/apollo/guidance/tools |
Admin | Inspect tool descriptions / routing hints |
| GET | /api/v1/apollo/guidance/specs |
Admin | Inspect spec fragments |
| GET | /api/v1/apollo/guidance/connections |
Admin | Inspect service-connection hints |
| GET | /api/v1/apollo/guidance/stream?scope=<scope> |
Admin | Real-time SSE feed of Curator commits (debugging only) |
| POST | /api/v1/apollo/chat |
Admin | Conversational admin interface |
| GET | /api/v1/apollo/memories |
Admin | List observations with filters |
| GET | /api/v1/apollo/memories/{id} |
Admin | Inspect one observation |
| POST | /api/v1/apollo/memories |
Admin | Seed an observation manually |
| PATCH | /api/v1/apollo/memories/{id} |
Admin | Edit metadata (tags, notes) |
| DELETE | /api/v1/apollo/memories/{id} |
Admin | Forget |
| GET | /api/v1/apollo/artifacts |
Admin | List learned artifacts |
| GET | /api/v1/apollo/artifacts/{id} |
Admin | Inspect one artifact + version history |
| PATCH | /api/v1/apollo/artifacts/{id} |
Admin | Edit |
| POST | /api/v1/apollo/artifacts/{id}/promote |
Admin | Promote |
| POST | /api/v1/apollo/artifacts/{id}/demote |
Admin | Demote |
| POST | /api/v1/apollo/artifacts/{id}/rollback |
Admin | Revert to a prior version |
| DELETE | /api/v1/apollo/artifacts/{id} |
Admin | Forget |
| GET | /api/v1/apollo/audit |
Admin | Query audit log |
| POST | /api/v1/apollo/learn |
Admin | Manually trigger an Apollo synthesis pass |
| GET | /api/v1/apollo/stats |
Admin | Apollo's own observability (counts, timings, scores) |
MCP (admin chat tools)
Apollo's MCP tools mirror the admin CRUD surface, exposed only to Apollo's own admin chat LLM (§Admin Chat) — not aggregated into oracle's user-facing /agentspace MCP catalog. The tools are served from a private MCP endpoint mounted by oracle.apollo.chat.server and reachable only through the admin-chat conversation; they are never visible to L1 or L3 LLMs.
apollo_list_memories,apollo_get_memory,apollo_forget_memoryapollo_list_artifacts,apollo_get_artifact,apollo_promote_artifact,apollo_demote_artifact,apollo_rollback_artifact,apollo_forget_artifactapollo_list_graphs,apollo_get_graph_snapshot,apollo_rollback_graphapollo_query_auditapollo_trigger_synthesisapollo_list_decisions,apollo_explain_decision,apollo_discuss_decisionapollo_pause_curator,apollo_resume_curatorapollo_stats
Authentication & Authorization
- Admin endpoints require
adminrole via oracle's OAuth middleware + guardrails (component.oracle.gateway). - Guidance
GETendpoints are admin-only. L1 and L3 never call them. They exist for admin inspection of what Apollo would currently inject. - Secondary ingest path (
POST /api/v1/apollo/observations) accepts either the admin's Bearer token (for replay/seed) or, for any out-of-process emitter, the user's forwarded Bearer token — the same token oracle forwards downstream in its existing cross-service calls. There is no service-token infrastructure in axonis-core today; every cross-service call in the stack forwards the user's Keycloak-issued token (verified end-to-end against JWKS). Admin replay/seed additionally requires theadminrole. Phase-1 emitters do not exercise this path. - Oracle's primary in-process path (all L1-relayed events + oracle's own
llm_turn+ oracle-observed L3tool_output/tool_error+final_response) bypasses network auth — it is a direct function call within the same process, already authenticated at the ingress byOAuthMiddleware. - Neither L1 nor L3 authenticates to Apollo — neither layer addresses Apollo on any path (ingest or guidance). Both talk to oracle; oracle handles Apollo (§Invariants 14).
- Injection channel (response-attached) rides the ambient auth of the envelope it is embedded in. The
/chatresponse is already authenticated per the inbound/chatrequest; the outbound MCP dispatch is already authenticated per oracle's forwarded token. No additional auth layer is introduced for attached guidance. - Admin SSE debug feed uses the same
OAuthMiddlewareon connection handshake and is gated to theadminrole. - Apollo honors all oracle guardrails. Curator cannot widen a caller's tool access. Attached guidance that references tools a subscriber cannot use is filtered out before the envelope is serialized.
- Deferred: once a Keycloak client-credentials grant is introduced for service-to-service auth (noted in component.oracle.gateway as pending),
APOLLO_SERVICE_TOKEN-authenticated ingest from background/batch workers becomes possible. Until then, ingest without a user token context is not supported.
Ingest Semantics
Observation ingest has two paths. The primary path, used by every Phase-1 emitter (oracle + cortex), is in-process only — oracle observes the envelopes flowing across its own boundaries and calls oracle.apollo.observer.ingest directly. The secondary path is the HTTP POST endpoint, mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.
Primary path: in-process emission by oracle
Per §Invariants 14, neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter in production. On every inbound /chat request, oracle extracts L1 signals from the request body and calls the observer in-process. On every outbound MCP dispatch, oracle observes the round-trip and emits in-process on the L3 service's behalf:
| Event(s) | Emitted when | Emitter call site |
|---|---|---|
intent_schema, user_prompt, user_feedback |
/chat request arrives or a feedback submission is posted |
oracle/server/api/routes.py |
llm_turn |
oracle's own LLM request/response cycle completes | oracle/server/llm/tool_executor.py |
tool_output, tool_error |
an outbound MCP dispatch to an L3 service returns | oracle/server/llm/tool_executor.py + oracle/server/mcp/server.py (proxy path) |
final_response |
oracle is about to return the /chat response body |
oracle/server/api/routes.py |
All emissions flow through helpers in oracle/apollo/hooks/chat.py which enqueue the envelope on the in-process async queue via oracle.apollo.observer.ingest.ingest(...). No network call. No authentication layer (the helpers live inside oracle's process, authenticated at the ingress by OAuthMiddleware). Failure modes are purely local: a full queue increments apollo_ingest_queue_dropped_total; an observer exception is caught and logged so the user request is unaffected.
Secondary path: HTTP POST (admin replay + out-of-process services)
The POST /api/v1/apollo/observations endpoint remains mounted on oracle's Starlette app for two use cases:
- Admin replay/seed — an admin manually re-ingests observations (e.g., to backfill after an outage or to seed synthetic test data). Requires the
adminrole. - Services outside oracle's MCP dispatch reach — any future service whose outputs are not observable through an oracle-mediated MCP round-trip can emit via
ApolloClient. None of the Phase-1 emitters use this path.
Endpoint:
POST /api/v1/apollo/observations
Content-Type: application/json
Authorization: Bearer <user-token> # admin token for replay, or the user's forwarded token for out-of-process services
traceparent: 00-<trace-id>-<parent-span-id>-<flags>
{ "observations": [<envelope>, ...] }
A single envelope is always valid; the array form enables batching on the client. Apollo responds 202 Accepted as soon as every envelope is placed on the in-process queue. Per-envelope validation happens inside the background worker and is logged (not bubbled to the caller) so a single bad envelope does not fail a batch.
The HTTP POST is a fire-and-accept call. Because Apollo's request handler does nothing but enqueue, the server-side operation is a local memory write — never a WAN hop inside the request. Client-side timeouts can therefore be generous (default 30 s) without risking silent drops from network jitter: the handler always responds in sub-millisecond time on a healthy Apollo.
Client-side helper: ApolloClient
ApolloClient in axonis-core is the HTTP client used by the secondary path. Phase-1 services (oracle + cortex) do not import it — oracle emits in-process and cortex emits nothing at all. ApolloClient is retained so admin tooling and any future out-of-process emitter can reach the endpoint without a bespoke HTTP client.
ApolloClient.emit(envelope) does a single httpx.AsyncClient.post with:
- A generous request timeout (
APOLLO_INGEST_POST_TIMEOUT_SEC, default 30). - Bounded retries with exponential backoff + jitter on transient failures (
APOLLO_INGEST_RETRY_ATTEMPTS, default 2;APOLLO_INGEST_RETRY_BASE_MS, default 200;APOLLO_INGEST_RETRY_CAP_MS, default 2000). Transient = timeout, 5xx, 429, connection error. 4xx except 429 is not retried. - Client-side batching via a size-or-interval hybrid:
APOLLO_INGEST_BATCH_SIZE(default 50) orAPOLLO_INGEST_FLUSH_INTERVAL_MS(default 500), whichever first. - Lifecycle flush on process shutdown (signal handler +
atexit) and on explicitApolloClient.flush()calls.
ApolloClient is pure HTTP — the same shape as axonis-core's RestClient and MCPClient (axonis_core/gateway/). No new transport primitive is introduced.
Server side: in-process async queue
Apollo's ingest handler is thin:
async def ingest_handler(request):
envelopes = parse_body(request)
for env in envelopes:
try:
_INGEST_QUEUE.put_nowait(env)
metrics.incr("apollo_ingest_accepted_total", service=env.service)
except asyncio.QueueFull:
metrics.incr("apollo_ingest_queue_dropped_total", service=env.service)
return JSONResponse({"accepted": len(envelopes)}, status_code=202)
The queue is bounded by APOLLO_INGEST_QUEUE_MAXSIZE (default 10000). When the queue fills, put_nowait raises QueueFull and Apollo increments apollo_ingest_queue_dropped_total — the failure is never silent, visible on /stats under degraded_emitters.
A pool of background worker coroutines (APOLLO_INGEST_WORKER_CONCURRENCY, default 4) drains the queue. Each worker performs the full ingest: normalize → write to apollo_observations → update graphs → dispatch synthesis triggers per §Learner. Worker failures are logged and the envelope is reprocessed on a bounded retry budget (APOLLO_INGEST_WORKER_RETRY_ATTEMPTS, default 2) before being moved to a dead-letter log (APOLLO_INGEST_DEAD_LETTER_PATH, optional JSONL file; unset by default).
Failure visibility
No silent failure modes exist on the ingest paths — primary (oracle in-process) and secondary (HTTP POST). Every failure kind is counted. The {service} label is the envelope's service field — the observed L3 target for Phase-1 emissions (oracle is the actual emitter but per-service visibility is what operators need).
| Metric | Meaning |
|---|---|
apollo_ingest_accepted_total{service} |
Envelopes successfully enqueued (both paths) |
apollo_ingest_queue_dropped_total{service} |
Envelopes rejected because the queue was full (both paths) |
apollo_ingest_post_failure_total{service, kind} |
Secondary-path POST failures after retries exhausted (timeout / 5xx / etc.). Never fires for Phase-1 emitters (they go in-process). |
apollo_ingest_worker_failure_total{service} |
Background-worker failures after retries (moved to dead-letter) — applies to both paths |
apollo_ingest_queue_depth |
Current depth of the in-process queue |
apollo_ingest_last_ingest_ts{service} |
Timestamp of last successful enqueue per service — covers both oracle's in-process call and secondary-path POSTs |
apollo_ingest_last_drain_ts{service} |
Timestamp of last successful worker drain per service |
Services whose last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300) for a service that should be active, or whose queue_depth exceeds APOLLO_INGEST_DEPTH_WARN (default 5000), are surfaced on /stats under degraded_emitters. For Phase-1 services, "degraded" means oracle stopped observing them (e.g., oracle hasn't dispatched an MCP call to cortex in five minutes) — not that a POST failed.
Dedup on at-least-once delivery
Client retries can produce duplicate envelopes. Apollo's observer deduplicates on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC (default 300) before writing to Elastic.
Config knobs (all prefixed APOLLO_)
| Env var | Default | Purpose |
|---|---|---|
APOLLO_INGEST_BATCH_SIZE |
50 | Max envelopes per POST body |
APOLLO_INGEST_FLUSH_INTERVAL_MS |
500 | Max time an envelope waits in client buffer before flushing |
APOLLO_INGEST_POST_TIMEOUT_SEC |
30 | Per-POST HTTP timeout — generous, since the server handler is in-memory only |
APOLLO_INGEST_RETRY_ATTEMPTS |
2 | Bounded client retries on transient failure |
APOLLO_INGEST_RETRY_BASE_MS |
200 | Base delay for exponential backoff |
APOLLO_INGEST_RETRY_CAP_MS |
2000 | Max delay between retries |
APOLLO_INGEST_QUEUE_MAXSIZE |
10000 | Server-side in-process queue capacity |
APOLLO_INGEST_WORKER_CONCURRENCY |
4 | Number of background worker coroutines draining the queue |
APOLLO_INGEST_WORKER_RETRY_ATTEMPTS |
2 | Bounded worker retries before dead-letter |
APOLLO_INGEST_DEAD_LETTER_PATH |
unset | Optional JSONL path for envelopes moved to dead-letter after worker retries exhausted |
APOLLO_INGEST_STALE_WARN_SEC |
300 | Seconds without a successful POST before an expected-active service is flagged |
APOLLO_INGEST_DEPTH_WARN |
5000 | Queue-depth threshold for surfacing Apollo itself as degraded on /stats |
APOLLO_INGEST_DEDUPE_WINDOW_SEC |
300 | Window for (trace_id, event_type, timestamp, service) dedupe on at-least-once delivery |
Layer 1 Intent Schema Obligation
Layer 1 is expected but not required to emit an intent_schema observation with each request. The obligation is best-effort throughout Phase 1 and Phase 2, with a configurable path to required once Layer 1's schema contracts stabilize.
Best-effort mode (default)
- Layer 1 SHOULD include an
intent_schemablock in every/chatrequest body it sends to oracle. Oracle extracts the block and emits theintent_schemaobservation to Apollo in-process (§Invariants 14 — L1 never addresses Apollo). A request without the block is still served; oracle simply emits nointent_schemaobservation for that trace. - If a schema is present on a trace, graph nodes are typed explicitly and the
schema_mismatchfailure signal (§Evaluator signal 2) is active for that trace. - If a schema is absent, Apollo's extractors fall back to prompt-inference and mark the resulting nodes
inferred=true. Drift detection and evaluator confidence weight inferred nodes lower. Theschema_mismatchsignal is not evaluated for that trace; the L3-performance penalty (§Evaluator) still fires on signal 1 (hard errors), but signal 2 is dark. GET /api/v1/apollo/statsreportsintent_schema_coverage— percentage of traces with a Layer 1 schema in the last rolling window — so admins can see when Layer 1 coverage is high enough to flip to required.
Required mode
APOLLO_REQUIRE_INTENT_SCHEMA=trueflips behavior: oracle rejects inbound/chatrequests whose body lacks anintent_schemablock with a 400 at the ingress — L1 is the direct caller and sees the rejection. Traces without a schema are never created; nothing to drop at the Observer layer.- The flip is a config change, not a code change. No Apollo, oracle, or L1 redeploy is needed — but Layer 1's
/chatemission behavior must already include the schema or the flip will start rejecting real traffic. - Phase 3 is the expected time to flip, once Curator empowerment demands the cleaner signal. Admin can flip earlier if stats show high coverage.
Logging
Every Apollo module and every service participating in Apollo's
observation / injection loop uses the axonis-core logger rather than a
module-local logging.getLogger() call. The logger module is
axonis.logger, which implements the three-logger convention (log,
error, audit) with consistent handler shapes so logs from any
component read coherently when aggregated.
Three loggers, three audiences
| Logger | When to use | Destination |
|---|---|---|
log |
Normal operational telemetry — info, warning, debug. |
Console + axonis.log |
error |
Exceptions, permanent failures, data-loss events, misconfiguration. | Console + error.log |
audit |
Important transactions that must be traceable independently of volume. | audit.log (file only) |
Import pattern:
from axonis.logger import log, error, audit
What counts as audit-worthy
Apollo MUST route the following transactions through the audit logger
so they leave a trail in audit.log separate from regular operational
noise:
- Worker pool start / shutdown / cancellation (§Ingest Semantics).
- Graph snapshot completion (§Snapshots and trajectory) — per hour.
- Every Curator action — promote, demote, forget, edit, rollback,
compact, drift-hold, upstream-flag, pause_curator, resume_curator.
(Complementary to the
apollo_auditElastic index: the audit log captures the event as structured text alongside other platform audit events; the Elastic index is the queryable, structured source of truth.) - LLM synthesis proposals that result in a Curator commit (the proposal → drift-check → commit boundary).
DriftEventcreation (§Graph-anchor drift check).- Admin chat actions that mutate state, logged with
actor: "admin:<username>". - Guidance injection commits — every push of
apollo_guidanceonto an outbound envelope is audit-worthy at the commit level, though the per-turn attachment is a delivery detail (not audited). - Subscriber connection / disconnection events on the admin SSE debug feed.
What stays in log / error
- Per-observation ingest (
log.info/log.debug) — too high-volume foraudit. - Per-attach-turn emissions — ditto.
- Retry attempts, transient failures —
log.warning. - Timeouts on the attach path (graceful degradation) —
log.warning. - Queue overflow, exhausted retries, worker failures —
error. - Exceptions swallowed by the hot path —
errorso they still land inerror.logwithout propagating into the request path.
Rationale
Splitting the three channels keeps audit.log the single place an
operator or admin-chat tool can scan when investigating a system-level
state change without being drowned in routine telemetry. Separating
error.log keeps every permanent-failure signal (data loss, persistent
outage, contract violation) in one place regardless of which module it
came from.
Failure Posture
- Apollo slow on attach: the in-process guidance fetch is bounded by
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS(default 10 ms). On overshoot, oracle serializes the/chatresponse or MCP dispatch withoutapollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. User sees no failure; metricapollo_guidance_attach_timeout_totalsurfaces the event. - Apollo unreachable as a process: since Apollo is a package inside oracle, "Apollo unreachable" means oracle is itself broken, which is a larger incident. If the Apollo module fails to import or initialize at startup, oracle continues serving
/chatand tool dispatches withoutapollo_guidanceattached. Ingest endpoint returns 503. - Ingest queue full:
POST /api/v1/apollo/observationsresponds 202 but incrementsapollo_ingest_queue_dropped_total{service}. Never silent — visible on/statsunderdegraded_emitters. - Ingest client POST fails: client retries within budget, then drops the batch and increments
apollo_ingest_post_failure_total{service, kind}. Visible on/stats. Emitter's task continues unaffected (observations are telemetry, not transactional). - Apollo worker crashes mid-ingest: at-least-once redelivery from the asyncio queue; observer dedupes on
(trace_id, event_type, timestamp, service). - Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation on subsequent observations and demotes; the next attached payload reflects the demotion. Admin can force-rollback via audit log at any time.
- Curator goes rogue: every action (mutation + commit) is in the audit log; admin can
pause_curator()immediately via chat or CLI. Paused Curator → artifact set stops changing → attached payloads continue to reflect the as-of-pause state until resume.
Apollo's LLM
Apollo runs its own LLM, separate from oracle's user-facing LLM routing. Apollo's LLM is the primary driver of synthesis, invoked per event (see §Learner → LLM synthesis).
Model: pluggable by design
The model is selected by configuration and must remain swappable without code changes. Apollo's LLM provider layer normalizes across providers so that a newer, stronger model can replace the current one as the state of the art advances.
Current default: MiniMax M2.7.
It is the best-available fit at the time of this spec given its context window, cost profile, and availability — but the spec is deliberately agnostic. Apollo must not encode MiniMax-specific assumptions in prompt shapes, input formats, or response parsers. The provider layer handles any per-model translation.
Operators can swap the model by changing env vars only:
APOLLO_LLM_PROVIDER=minimax # current default; swap with any provider registered in the router
APOLLO_LLM_MODEL=m2.7 # current default; replace with a newer model when available
APOLLO_LLM_API_KEY=...
APOLLO_LLM_BASE_URL=... # for self-hosted or proxied inference
APOLLO_SYNTHESIS_MAX_CONCURRENT=4 # cap on concurrent synthesis calls (event bursts from L3)
APOLLO_GUIDANCE_TIMEOUT_MS=50 # timeout for admin GET /guidance* inspection calls
The LLM router (oracle/apollo/llm.py) must support MiniMax as a first-class provider alongside anthropic / openai / groq / trinity / ollama in oracle's existing router. New providers register through the same interface — adding a model is an additive router change, never a change to Apollo's business logic.
Local MiniMax via HuggingFace (native, pre-trained, on-disk)
Apollo's LLM layer reserves a provider slot for a locally-stored, HuggingFace-pulled MiniMax model — a complement to the default OpenAI-compatible endpoint path. This path is intended for deployments where one or more of the following holds:
- The cluster is air-gapped and cannot reach MiniMax's hosted endpoint.
- Operators prefer running inference on their own GPU inventory for latency, cost, or data-governance reasons.
- A fine-tuned MiniMax variant the operator owns needs to be loaded instead of the stock checkpoint.
Provider selector. Set APOLLO_LLM_PROVIDER=minimax-local (see §Environment Configuration). The openai provider continues to be the default for hosted deployments; nothing about the hosted path changes.
Canonical HuggingFace load signature. The provider MUST honor the model card's canonical call shape — the same two lines the MiniMax team publishes on the model page:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
trust_remote_code=True is required because MiniMax ships its own tokenizer and modeling code alongside the weights. A future operator who swaps to a fine-tune with custom modeling code needs the flag too.
On-disk location (HuggingFace convention). HuggingFace caches pulled models under:
${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/
blobs/ # content-addressed weight shards
snapshots/<sha>/ # symlinks to blobs for the resolved revision
refs/ # branch/tag pointers
The exact layout is HuggingFace's; Apollo does not parse or override it. Operators pre-pull the model with either of:
huggingface-cli download MiniMaxAI/MiniMax-M2.7
# or, equivalently, any Python that calls from_pretrained() once to warm the cache.
Pre-pulling is the recommended pattern for production: the first from_pretrained call in a cold container downloads tens of gigabytes of weights, which is not acceptable on the request path. Pre-pull during image build or via an init container.
Operator-controlled path (reserved knob, not yet implemented). A future enhancement will add APOLLO_LLM_LOCAL_MODEL_PATH for operators who keep weights outside the HF cache (e.g., a mounted shared filesystem with a custom fine-tune). When set, the provider passes the path verbatim as the first positional argument to from_pretrained instead of the MiniMaxAI/MiniMax-M2.7 model id. Until that knob is wired, the provider loads only the stock MiniMax checkpoint from the HF cache.
Disk + GPU requirements. MiniMax-M2.7 is a large model: expect the checkpoint to land in the tens of gigabytes on disk, and plan for a GPU (or multi-GPU node) with enough VRAM for the resolved context window. Deployments that cannot meet those budgets should stay on the hosted endpoint path.
What ships today vs. what is deferred. The minimax-local provider is scaffolded in oracle/apollo/llm.py at Milestone 8 — it imports transformers lazily, honors the canonical load signature above, and can complete a prompt on a machine that has the weights and deps in place. The following production-grade enhancements are intentionally deferred to a later milestone and tracked under §Deferred below:
- Thread-pool / process-pool offload of the synchronous HF forward pass (today the call runs inline on the event loop).
- Explicit device mapping (
device_map="auto",torch_dtype, bitsandbytes / 4-bit / 8-bit quantization knobs). APOLLO_LLM_LOCAL_MODEL_PATHoperator override for non-HF-cache paths.- Pre-pull orchestration + readiness gate (block
APOLLO_LLM_PROVIDER=minimax-localdeployments from serving until the checkpoint is resident and the forward pass warms successfully). - Streaming tokens through the provider abstraction (admin chat UX).
Until those land, minimax-local is a dev-time and air-gapped-lab-time fallback; the default production pattern remains APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at an OpenAI-compatible MiniMax endpoint.
Separation from oracle's user-facing LLM
Apollo's LLM configuration is independent of oracle's user-facing LLM routing. The two can use the same provider or different providers; the same model or different models. Apollo's usage is tracked separately via the Meter (component.oracle.gateway §Metering) under client id apollo. User-facing chat metrics and Apollo metrics are separate in dashboards.
Apollo owns its LLM client — not axonis-core's Client
axonis-core provides the platform's shared LLM client (axonis.llm.Client; platform.axonis-core#llm-pattern) for user-facing
chat (L1/L2) and simple backend services. Apollo does not consume it. It keeps a purpose-built client at
oracle/apollo/llm.py (LLMClient), because the Curator and admin-chat have requirements the shared,
deliberately-lightweight core client does not — and should not — carry. Each difference is load-bearing, not incidental
duplication:
| Apollo capability | Why Apollo needs it | Why it stays out of core Client |
|---|---|---|
response_format="json" + LLMResponse.as_json() / parsed |
The Curator/synthesis path demands strict-JSON proposals against a documented schema; as_json() parses defensively and returns None on a malformed body so the dispatcher routes to drift review rather than dropping a bad proposal (apollo/learner/prompts.py, synthesis.py). |
Core Client is a generic completion surface — no JSON-format biasing, no parse / None-on-malformed contract. Adding it would couple core to Apollo's synthesis semantics. |
tool_choice control (auto / none / required) + response_format="text" |
Apollo's admin chat (apollo/chat/server.py) streams prose and offers inspection tools, steering / forcing / suppressing tool use per turn. |
Core's OpenAI-compatible path is raw httpx and hardcodes tool_choice="auto" with no response_format — by design, to stay minimal. |
minimax-local provider (in-process MiniMax-M2.7 via HuggingFace transformers) |
Dev / air-gapped deployments with no network path to a hosted MiniMax endpoint. | platform.axonis-core mandates axonis-core stay lightweight with no ML dependencies; torch / transformers must not enter core. |
openai SDK transport (not raw httpx) |
First-class response_format + tool_choice against MiniMax's OpenAI-compatible endpoint. |
Core uses raw httpx for its OpenAI-compatible providers and does not depend on the openai SDK there. |
Stub + singleton harness (install_stub_response / install_stub_stream, reset_singleton, get()) |
Synthesis / curator / admin-chat tests inject canned responses and stream programs to exercise dispatch and drift-routing deterministically, with no network or GPU. | Core Client is stateless and harness-free; its consumers mock at the call boundary instead. |
Consequently Apollo's LLMResponse (content, parsed, tokens_in, tokens_out, tool_calls: list[ToolCall(id, name,
arguments)]) and core's Response (text, tool_calls: list[ToolCall(id, name, input)], stop_reason,
input_tokens, output_tokens) differ deliberately — two value objects for two jobs, consumed by disjoint call sites
(Apollo's synthesis / admin-chat code + tests vs. oracle's chat loop).
Decision (2026-06-04): Apollo's LLM client is NOT converged onto axonis-core's Client. The two share design DNA
(provider-agnostic completion + streaming with a typed terminal response) but serve different layers. Converging would
either strip the Curator's JSON-synthesis robustness and the admin-chat controls, or push Apollo-specific features and
heavy ML dependencies into the lightweight core. The only genuine overlap — raw provider calling — cannot be cleanly
shared anyway, because Apollo requires response_format / tool_choice that core's httpx path omits. Revisit only if
Apollo's needs converge with the user-facing client (e.g. it drops the local-model and strict-JSON-synthesis
requirements).
Budget isolation
A burst of synthesis calls triggered by a long fusion run must not starve user-facing chat. Apollo's LLM has its own rate limit, its own quota, and its own metering client id. When Apollo's quota is exceeded it defers synthesis (the event queue holds triggers up to a cap); user-facing oracle chat is unaffected.
Drift Prevention
Apollo influences a large fraction of the system. Drift in its artifacts cascades into the prompts of layer 1 and the outputs of layer 3. The spec encodes several anti-drift guarantees:
- Observation cadence is fixed and coarse. No per-token events. High-signal-to-noise ratio in the raw data.
- Graphs are the deterministic anchor. Decision graphs update on every observation via rule-based extractors — they never create free-form artifacts and cannot drift on their own. They record what actually happened, not what the LLM thinks happened.
- Every LLM output is checked against the graphs. The LLM is the primary driver of synthesis, but every artifact, promotion, and pattern it proposes is validated against current graph state and trajectory before the Curator commits it. Proposals that diverge from graph-recorded reality are flagged as
DriftEventand held for admin review. - Drift is detected structurally, not rate-limited. Short-window vs. long-window edge-weight divergence, rate-of-new-nodes caps, LLM-output-vs-graph divergence, and trajectory breaks distinguish smooth evolution (allowed) from sudden shift (flagged). Apollo can learn continuously because the graphs provide a rigid referent.
- Curator is bounded. Cannot touch auth, guardrails, or user data — only Apollo's own artifacts.
- Every Curator action is auditable. Admin can see what changed, when, why, and by whom.
- Every artifact is versioned. Rollback is always possible.
- Evaluator closes the loop. Artifacts that stop correlating with good outcomes decay automatically.
- Admin can pause the Curator. An emergency off-switch prevents runaway mutation.
- Guidance degrades gracefully. If Apollo is slow or wrong, oracle falls through without injection — the base system still functions.
Phased Rollout
Apollo ships in three phases to manage scope and risk.
Phase 1 — Observe and ground
- Observer with oracle as the sole Phase-1 emitter — oracle calls
oracle.apollo.observer.ingestin-process for every L1-, L2-, and L3-origin event.POST /api/v1/apollo/observationsis mounted for admin replay/seed and future out-of-process emitters (not used by oracle or cortex). - Memory indices live:
apollo_observations,apollo_graph_nodes,apollo_graph_edges,apollo_graph_snapshots - Deterministic graph updates on every ingested observation (extractors, node/edge upserts, EWMA weights)
- Hourly graph snapshots + maintenance task (
delete_by_queryonexpires_ts, snapshot coarsening) - Guidance API serves graph-derived context only (no artifacts yet; artifacts index is empty)
- Admin memory and graph CRUD endpoints
- Admin chat (read-only — can inspect graphs, observations, lineage; cannot yet promote/demote)
- Backend emitter integration: oracle itself + cortex (parallax deferred to a follow-on phase per Q7); cortex is the Phase 1 L3 subscriber per Q20. Beacon (L1) is deferred until a beacon↔oracle connection is designed.
- Apollo is additive: Apollo lives entirely under
oracle/apollo/with its own memory, indices, and stores. Oracle's existing memory modules (oracle/server/memory/conversation.py,oracle/server/memory/cross_service.py,oracle/server/models/memory.py) remain untouched and continue to serve their current callers.
Phase 2 — Synthesize and advise
- Event-driven LLM synthesis loop live (triggers: L1 request ingested, L3 output ingested, admin chat, admin-initiated)
- Artifact creation, editing, promotion, demotion (IntentPattern, FailurePattern, ToolPairingHint, etc.)
- Graph-anchor drift check on every LLM synthesis output
DriftEventartifacts produced on divergence; flagged proposals held for admin review- Evaluator scoring active (weighted failure signals feed rolling per-artifact scores)
- Injection Channel live. Oracle attaches current applicable guidance to every
/chatresponse body (L1) and to every outbound MCP dispatch bound for anagent-kind L3 service. Attach budgetAPOLLO_GUIDANCE_ATTACH_TIMEOUT_MS(default 10 ms) omits the field on overshoot without failing the request. - Admin inspection endpoints (
GET /guidance?scope=..., admin-only SSE debug feed) live - Admin chat fully active (can trigger synthesis, promote/demote/rollback artifacts, pause Curator)
- Remaining backend emitters onboard (see §Integration Backlog)
Phase 3 — Empower and maintain
- Curator autonomous actions enabled (promote/demote/forget without admin approval, bounded by §Curator hard invariants)
- Evaluator-driven demotion and forgetting cycles (score below threshold for N=5 cycles → forget, audited)
- LLM-driven compaction of expiring observations into summary artifacts (event-driven, at admin-initiated synthesis)
- Full audit and rollback surface live (
apollo_auditindex +apollo_artifact_history) - Oracle's existing memory modules (
conversation.py,cross_service.py,models/memory.py) remain in place. Any consolidation is out of scope for the Apollo rollout — see §Deferred: Consolidation of Oracle Memory Modules.
Deferred: Consolidation of Oracle Memory Modules
Oracle today has three memory modules that predate Apollo:
oracle/server/memory/conversation.py— Redis-backed multi-turn conversation history (ConversationStore)oracle/server/memory/cross_service.py— Redis KV cross-namespace fact store (CrossServiceMemory)oracle/server/models/memory.py— a stub overaxonis-core'sMemory(UDS)primitive
Apollo provides overlapping capabilities: conversation lineage is reconstructible from user_prompt + final_response observations; fact storage is superseded by artifacts + graphs; the UDS memory class is already the canonical substrate under platform.axonis-core.
Recommendation (deferred, not in scope): once Apollo is proven in production and its graphs/artifacts demonstrably cover the use cases served by these modules, the three can be consolidated:
conversation.py→ absorbed intooracle/apollo/memory/(if still needed beyond observation reconstruction)cross_service.py→ deprecated; call sites migrate to Apollo guidance queries or directMemory(UDS)readsmodels/memory.py→ deleted; any imports switch toaxonis_core.userspace.intelligence.Memory
Status: not scheduled. Oracle's existing modules stay in place throughout all three Apollo phases. Consolidation requires a separate, explicitly-scoped effort once the user approves it — Apollo will not reach into or replace oracle's existing memory surface as part of this spec.
Integration Backlog
Phase 1 onboards oracle + cortex as observation emitters and cortex (L3) as the lone guidance subscriber. Oracle is the sole emitter — cortex carries no Apollo emission code; oracle observes each MCP round-trip to it and emits tool_output / tool_error in-process on its behalf (§Ingest Semantics → Primary path). Beacon (L1) is deferred from Phase 1 because beacon has no HTTP connection to oracle today — its MCP_SERVER_URL points at cortex direct, so attached apollo_guidance has no path into beacon's process. The beacon↔oracle connection is a separate spec decision tracked in §Integration Backlog. Parallax is deferred from Phase 1; its emitter (oracle-observed) and subscriber (ApolloGuidanceCache) wiring follow the same pattern as cortex when it onboards.
Additional services become visible to Apollo through one of two mechanisms, chosen per-service based on how oracle reaches them:
- In-process relay (default). Any service oracle MCP-dispatches to — i.e., any service registered on
ServiceRegistryand reachable through oracle's tool-use or MCP-proxy paths — is automatically observed by oracle with no code changes in that service. Onboarding is a single-line addition: settingcomponent_kindon the service'sServiceRegistryrecord. - Direct POST via
ApolloClient(fallback). Services whose outputs are not observable through an oracle-mediated MCP round-trip (batch jobs, out-of-process workers, federated emitters, etc.) POST envelopes to/api/v1/apollo/observationsthemselves. No Phase-1 service uses this path.
Until one of these two mechanisms is wired for a service, its outputs are invisible to Apollo.
component_kind classification
Every L3 service that registers with oracle's ServiceRegistry must declare a component_kind:
agent— has its own LLM, makes prompt-driven decisions, receivesapollo_guidanceattached to every MCP dispatch from oracle, and has an intent contract against whichschema_mismatch(Evaluator signal 2) is evaluated.library— no LLM, purely operational (CRUD, compute, I/O). Emitstool_output/tool_errorobservations but does not receiveapollo_guidancein its MCP dispatches (oracle filters it out before serialization) and is not subject to theschema_mismatchsignal. Evaluator signal 1 (hard error) still applies.
The classification is a field on the ServiceRegistry record (server/mcp/registry.py). It is set at registration time and may be changed by the owning team via re-registration. Apollo treats ServiceRegistry as the single source of truth — no separate Apollo registration is used (§Invariants item 16).
Status table
Classifications below are initial best-guesses; each service's owning team confirms on integration. Misclassification is low-risk: a wrongly-tagged library will simply see apollo_guidance attached to its dispatches and ignore it (no LLM to read it); a wrongly-tagged agent will miss guidance it could have used. Either is corrected by updating the ServiceRegistry record.
| Service | Kind | Status | Notes |
|---|---|---|---|
oracle |
n/a (L2) | Phase 1 | In-process emission; no network call; not an L3 subscriber |
cortex |
agent |
Phase 1 | Query-adjacent reasoning; Phase 1 subscriber wiring lands in M14 |
parallax |
agent |
Deferred | Same pattern as cortex when onboarded; fusion-run execution; LLM-driven workflow |
fedai-rest |
library |
Pending | Dataset CRUD; ops / libs host; emits on dataset read/write + op outcomes |
testament |
TBD | Pending | Kind + emitter integration TBD by team |
titan |
TBD | Pending | Kind + emitter integration TBD by team |
rest / fedai-rest |
library |
Pending | Federation REST layer; emits on federated request outcomes |
Onboarding a pending service requires no Apollo code changes and — when the service is reachable from oracle's MCP dispatch path — no code changes in the pending service either. The only required artifact is the component_kind declaration on that service's ServiceRegistry record. Services on the fallback path additionally import ApolloClient and emit from their own process.
Environment Configuration
Apollo does not redefine any env var that already exists in the
platform deployment layer. The canonical source for deployment-level
configuration is developers-environment/conf/*.env — one file per
target (development.axonis.ai.env, matrix.axonis.ai.env,
edge.axonis.ai.env, vector.axonis.ai.env, etc.). Every target ships
a consistent platform baseline; Apollo inherits it transitively through
axonis-core, oracle, and its own storage/logger dependencies.
Inherited platform variables (not Apollo's to define)
| Variable(s) | Consumer | Apollo's use |
|---|---|---|
ELASTIC_HOST, ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_VERIFY, ELASTIC_TIMEOUT, TEMPLATES_DIR, ELASTIC_PKI_CA |
axonis.elastic.Elastic |
Storage for apollo_observations, apollo_artifacts, apollo_graph_*, apollo_audit. Every Memory(UDS) subclass in apollo/memory/store.py inherits this config. |
REDIS_URL (oracle-style) or REDIS_HOST + REDIS_PORT + REDIS_PASSWORD + REDIS_DB + REDIS_TLS + REDIS_VERIFY (platform-standard) |
oracle/server/memory/*, axonis.redis.Redis |
Oracle's ConversationStore + CrossServiceMemory; unused directly by Apollo. |
SSO_CLIENT_ID, SSO_CLIENT_SECRET, SSO_TOKEN_URL, SSO_WELLKNOWN, SSO_INTROSPECT_URL, SSO_VERIFY |
oracle's OAuthMiddleware (+ axonis.auth) |
Validates Bearer tokens on every request reaching /api/v1/apollo/*. No Apollo-specific auth config. |
ATLAS_LOG_LEVEL, ATLAS_WORKSPACE, AXONIS_LOG_LEVEL, AXONIS_WORKSPACE |
axonis.logger (§Logging) |
Log level + log-file root for Apollo's three logger streams (log/error/audit). oracle/tests/conftest.py also respects ATLAS_WORKSPACE for test-session log placement. |
FEDERATE_DOMAIN, FEDERATE_NAME, FEDERATE_UUID, FEDERATE_PARTY_*, FEDERATE_PROTOCOL_*, FEDERATE_WORK_MODE_* |
axonis.uds federation hooks |
Picked up automatically if/when Apollo artifacts start federating (post-Milestone 13). No Apollo-specific federation config. |
Apollo-owned variables (all APOLLO_*)
Canonical location: developers-environment/conf/*.env — specifically the shared dev-env file (development.axonis.ai.env) plus any target-specific overrides (matrix.axonis.ai.env, vector.axonis.ai.env, edge.axonis.ai.env). Every APOLLO_* variable is declared there with a production-ready default. oracle/apollo/settings.py reads them via os.getenv(...) with fall-back defaults that match the env-file values, so if the shared env is unsourced the system still comes up sensibly — but the authoritative source is the deployment env file.
Why it lives in the shared env file rather than per-service: Apollo's observation path runs in oracle, but its configuration surface informs the contract every other service consumes (guidance attach budgets, trace-propagation expectations, retention windows). Keeping the defaults in the shared env file means oracle, parallax, cortex, and beacon all load the same baseline — an operator flipping APOLLO_CURATOR_AUTONOMOUS=true in the shared file affects the whole deployment consistently.
Every APOLLO_* variable Apollo's settings.py reads is mirrored in the env file. Grouped by subsystem:
- LLM:
APOLLO_LLM_PROVIDER,APOLLO_LLM_MODEL,APOLLO_LLM_API_KEY,APOLLO_LLM_BASE_URL(+ reservedAPOLLO_LLM_LOCAL_MODEL_PATH, not yet implemented) - Synthesis:
APOLLO_SYNTHESIS_MAX_CONCURRENT - Guidance attach:
APOLLO_GUIDANCE_ATTACH_ENABLED,APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS,APOLLO_GUIDANCE_TIMEOUT_MS - Ingest client side:
APOLLO_INGEST_BATCH_SIZE,APOLLO_INGEST_FLUSH_INTERVAL_MS,APOLLO_INGEST_POST_TIMEOUT_SEC,APOLLO_INGEST_RETRY_ATTEMPTS,APOLLO_INGEST_RETRY_BASE_MS,APOLLO_INGEST_RETRY_CAP_MS - Ingest server side:
APOLLO_INGEST_QUEUE_MAXSIZE,APOLLO_INGEST_WORKER_CONCURRENCY,APOLLO_INGEST_WORKER_RETRY_ATTEMPTS,APOLLO_INGEST_DEAD_LETTER_PATH,APOLLO_INGEST_STALE_WARN_SEC,APOLLO_INGEST_DEPTH_WARN,APOLLO_INGEST_DEDUPE_WINDOW_SEC,APOLLO_EMITTER_ENABLED - Decision Graphs:
APOLLO_GRAPH_SNAPSHOT_INTERVAL,APOLLO_GRAPH_EWMA_SHORT,APOLLO_GRAPH_EWMA_LONG,APOLLO_GRAPH_TRACE_STATE_TTL_SEC - Evaluator:
APOLLO_EVALUATOR_WEIGHT_L3_ERROR,APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH,APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK,APOLLO_EVALUATOR_WEIGHT_CONFIDENCE,APOLLO_EVALUATOR_L3_FAST_DEMOTE_N,APOLLO_EVALUATOR_NORMAL_DEMOTE_N - Curator:
APOLLO_CURATOR_AUTONOMOUS,APOLLO_CURATOR_AUTO_INTERVAL_SEC,APOLLO_COMPACTION_AUTO - Audit:
APOLLO_AUDIT_RETENTION_DAYS,APOLLO_INJECTION_AUDIT_HEARTBEAT_INTERVAL - Maintenance:
APOLLO_MAINTENANCE_INTERVAL,APOLLO_OBSERVATION_RETENTION_DAYS - Trace propagation:
APOLLO_TRACE_HEADER,APOLLO_REQUIRE_TRACEPARENT - Observation obligations:
APOLLO_REQUIRE_INTENT_SCHEMA - Drift detection:
APOLLO_DRIFT_Z_SCORE_THRESHOLD,APOLLO_DRIFT_NEW_NODES_PER_HOUR_CAP,APOLLO_DRIFT_TRAJECTORY_TOLERANCE - Integration (
ApolloClient):APOLLO_BASE_URL
None of these duplicate a platform variable. When adding a new APOLLO_*, add it to both oracle/apollo/settings.py (with its default) and the shared env file (with the same default) in one commit.
Per-deployment overrides
Each *.env in developers-environment/conf/ targets a specific
deployment (development, matrix, vector, edge, etc.). The shared
development.axonis.ai.env holds the baseline; production targets
override via their own file. Any Apollo variable that needs to differ
per target lives in the target-specific env — never hardcoded into
settings.py. Operators change behavior by editing the env file and
reloading, not by shipping code.
Dependencies
[project]
dependencies = [
# inherited from oracle (see component.oracle.gateway)
"axonis-core",
"fastapi>=0.110.0",
"starlette>=0.36.0",
"redis>=4.0.0",
"anthropic>=0.40.0",
"openai>=1.0.0",
# apollo-specific
"sentence-transformers>=3.0.0", # embeddings
"numpy>=1.24", # dense-vector math
]
LLM provider SDK. Apollo's current default LLM is MiniMax M2.7 (see §Apollo's LLM). MiniMax exposes an OpenAI-compatible API, so Apollo reaches it via the existing openai client with APOLLO_LLM_BASE_URL pointed at the MiniMax endpoint — no new SDK dependency is added. If a future model swap requires a non-OpenAI-compatible provider, an additive dependency joins oracle's existing provider set.
Apollo introduces no new top-level dependencies beyond libraries already declared in oracle's pyproject.toml; it activates existing dependencies (notably sentence-transformers and numpy) that oracle already includes.
Invariants
- Apollo does not execute workflows. It observes, learns, and advises. It never calls tools, never invokes backend services, never retries a failed request. Layer 1 drives iteration.
- Curator empowerment is bounded to Apollo's own artifacts. Curator cannot change auth, guardrails, token scopes, user conversations, or any non-Apollo state.
- Every autonomous action is auditable. No Curator mutation occurs without a record in
apollo_audit. - Apollo is internal. No Apollo endpoint is exposed outside the cluster except through oracle's existing external surface. Oracle remains the only externally exposed service (component.oracle.gateway invariant 1).
- Apollo failures do not break oracle. Guidance attachment has a hard timeout (
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS); on overshoot or internal Apollo failure, oracle serializes the response / MCP dispatch withoutapollo_guidance. Ingest failures never block the emitter's task and are surfaced as metrics (apollo_ingest_post_failure_total,apollo_ingest_queue_dropped_total) on/stats— never silent. - Apollo uses axonis-core's Memory UDS as its storage primitive. It does not re-implement or bypass the UDS pattern from platform.axonis-core.
- Admin chat is the only conversational surface. Role
adminis required. Non-admin users interact with Apollo only transitively via oracle. - Observation cadence is coarse by design. Token-level observations are prohibited. Turn-level, tool-level, error-level, and response-level only.
- Axonis-core remains ML-free. Any future ML dependencies (e.g., embedding generation) live in
oracle/apollo/, not in axonis-core. platform.axonis-core invariant 1 is preserved. - Artifacts are versioned; graphs are snapshotted. Every Curator mutation to an artifact creates a new version in
apollo_artifact_history; graph-level rollback uses the hourly/daily/weekly snapshot tiers. Rollback is always possible. - Oracle's existing memory modules are not modified. Apollo is additive and coexists with
oracle/server/memory/*andoracle/server/models/memory.pythroughout all three phases. Consolidation is deferred and out of scope (§Deferred: Consolidation of Oracle Memory Modules). - Apollo's LLM is pluggable. No MiniMax-specific assumptions in prompts, input shapes, or response parsers. Model swap is a config change via
APOLLO_LLM_PROVIDER/APOLLO_LLM_MODEL, never a code change. - Layer 3 performance is the strongest failure signal. Evaluator weighting amplifies L3 errors and schema mismatches over softer signals, accelerates demotion on L3-dominant score drops, and cascades to flag upstream artifacts for synthesis review.
- Neither L1 nor L3 addresses Apollo directly. L1 talks to oracle; L3 talks to oracle (via MCP); oracle talks to Apollo. L1 and L3 hold no Apollo endpoint knowledge, no Apollo credentials, and make no Apollo calls on any in-production path. Oracle is Apollo's sole emitter for all Phase-1 events: L1-origin observations (
intent_schema,user_prompt,user_feedback) are emitted by oracle in-process after oracle receives the corresponding signal from L1; L3-origin observations (tool_output,tool_error) are emitted by oracle in-process after the MCP round-trip to an L3 service returns. Guidance flows the same way in reverse: it reaches L1 attached to/chatresponses, reaches oracle's own chat LLM in-process (no transport, since oracle hosts Apollo), and reaches L3 attached to outbound MCP tool dispatches. ThePOST /api/v1/apollo/observationsendpoint exists as a secondary path for admin replay/seed and for future services running outside oracle's MCP dispatch reach; Phase-1 emitters do not use it. No long-lived connections, no service tokens, no push channel in production. - Injection cannot execute code in any subscriber. Attached
apollo_guidance(or in-process cache contents on the L2 path) carries artifact data only. Subscribers update a local cache and consult it on their next LLM turn. Apollo cannot force a subscriber to act, call a tool, or mutate any state beyond its own cache. - No subscriber registry, no push channel. Apollo has no list of subscribers to push to. Guidance is delivered by oracle attaching the current applicable set to every response/dispatch leaving oracle (L1 attach, L3 attach) and consulted in-process by oracle's own chat LLM on the L2 path. L3 agent eligibility is still governed by
component_kindon theServiceRegistryrecord (libraries are filtered out before attachment); L1 eligibility is implicit (every/chatresponse carries L1 guidance); L2 consumption is implicit (oracle's tool-executor consults the local cache before every turn). - Apollo is the cross-service knowledge transfer channel.
MemoryService(axonis-core) is strictly per-service — every recall is scoped to the calling service's(user_id, service). A preference, fact, or instruction expressed to one service is never directly readable by another. When the same intent needs to shape behaviour across services (e.g. "user prefers concise responses" expressed to beacon should also bias oracle), Apollo's observation stream picks it up, synthesis distills it into an artifact (e.g. aPromptShimwithapplicability.service_name = nullfor cross-service scope), and the guidance attach channel delivers it to every applicable subscriber. Apollo never instantiatesMemoryServicefor cross-service reads — its view is the observation stream, which inherently spans all services. This separation means cross-service knowledge transfer is always curated, audited, and reversible (demote / forget) rather than implicit through silent shared-index reads.
Test Expectations
- Observer tests: each event type round-trips correctly through ingest; trace_id and parent_trace_id stitching works; cadence limits are enforced (no token-level events accepted); every Phase-1 event — L1-origin (
intent_schema,user_prompt,user_feedback), oracle's own (llm_turn,final_response), and L3-origin (tool_output,tool_error) — arrives via oracle's in-process emission path only. Oracle extracts L1 signals from/chatrequest body and feedback submissions, observes the MCP round-trip for L3 outputs, and callsoracle.apollo.observer.ingestin-process on both layers' behalf. A direct emit from L1 credentials or from cortex to any Apollo path is rejected in Phase-1 test fixtures (§Invariants 14). - HTTP ingest tests (secondary path): the
POST /api/v1/apollo/observationsendpoint continues to function for admin replay/seed and for out-of-process emitters.ApolloClient.emitPOSTs the envelope andtraceparentwith an appropriate Bearer token (admin token for replay, user-forwarded token for out-of-process emitters); server returns 202 as soon as the envelope is enqueued on the in-process async queue; queue overflow incrementsapollo_ingest_queue_dropped_totaland is visible on/stats(never silent); client retries on transient failures within the configured attempt budget, then surfacesapollo_ingest_post_failure_totalon permanent failure; at-least-once redelivery is deduped on(trace_id, event_type, timestamp, service)withinAPOLLO_INGEST_DEDUPE_WINDOW_SEC; background worker crashes move envelopes to the optional dead-letter JSONL path when retry budget exhausts; services over lag/staleness thresholds appear indegraded_emitters. - Memory tests: observations, artifacts, graph nodes, graph edges, and graph snapshots indices support CRUD via the axonis-core
Elasticbase class; embeddings generated on store; semantic recall via kNN composes with filters;expires_ts+delete_by_querymaintenance task coarsens and purges correctly. - Graph update tests: extractors are deterministic on every observation; node/edge upserts are idempotent; EWMA short- and long-window weights update correctly; no artifacts are created on the deterministic path.
- Synthesis tests: each event-driven trigger (L1 request, L3 output, admin chat, guidance miss, admin-initiated) invokes the LLM once; concurrent synthesis is bounded by
APOLLO_SYNTHESIS_MAX_CONCURRENT; duplicate triggers within atrace_idare coalesced to the latest observation; synthesis calls receive the correct subgraph and artifact context. - Graph-anchor drift check tests: LLM proposals consistent with graph state are committed; proposals that contradict strongly-weighted edges are flagged as
DriftEventand held for admin review; rate-of-new-nodes cap triggers drift flagging. - Guidance tests: intent → artifacts matching; layer filtering; caller-permission filtering (guardrails); empty-result fallback when artifacts index is empty; 50 ms timeout on the hot path.
- Evaluator tests: all four failure signals detected; L3 performance signals (1 and 2) weight heavier than user feedback and confidence; accelerated demotion (N=2) fires on L3-dominant score drops; upstream artifact re-flag cascade reaches
IntentPattern/PromptShim/SpecFragment; repeated L3 failures escalate toDriftEventrather than silent demotion; per-signal score decomposition is preserved in audit records. - Curator tests: each allowed action (promote, demote, forget, edit, rollback, compact); every disallowed action is refused (auth changes, guardrail changes, user-data access); audit record written for every mutation with
actor,trigger, before/after version; curator-pause blocks all Curator mutations. - Versioning tests: artifact mutation copies prior version to
apollo_artifact_historybefore overwrite; rollback restores the target version and creates a new version whoseprev_version_idpoints at the post-rollback state; rollback event itself appears in audit; graph snapshots restore correctly; structural graph mutations by admin are audited. - Admin chat tests: role gating (admin only); each chat tool executes correctly; audit log shows
actor: "admin:<username>";indefinite: trueflag works for critical actions. - Layer 1 schema tests: best-effort mode accepts traces without
intent_schemaand produces inferred nodes; required mode (APOLLO_REQUIRE_INTENT_SCHEMA=true) rejects schema-less traces with 400;intent_schema_coveragestat reports correct rolling percentage. - LLM swap tests: provider swap via env (
APOLLO_LLM_PROVIDER/APOLLO_LLM_MODEL) takes effect without code changes; MiniMax-via-OpenAI-compatible endpoint is exercised; no MiniMax-specific strings leak into prompt or response parsers. - Failure posture tests: admin
GET /guidancetimes out cleanly onAPOLLO_GUIDANCE_TIMEOUT_MS;APOLLO_GUIDANCE_ATTACH_TIMEOUT_MSovershoot on the attach path causesapollo_guidanceto be omitted from the response/dispatch and incrementsapollo_guidance_attach_timeout_totalwithout failing the user request; ingest queue overflow returns 202 withapollo_ingest_queue_dropped_totalincrement (never silent); Apollo module fails to import → oracle serves/chatand dispatches withoutapollo_guidance, ingest returns 503; Apollo hallucinates → evaluator demotes → next attached payload reflects the demotion → admin can force-rollback. - Injection channel tests: oracle attaches
apollo_guidanceto every/chatresponse body when an applicable artifact set exists for the caller's L1 scope; oracle attachesapollo_guidanceto every MCP dispatch bound for anagent-kind L3 service; dispatches bound forlibrary-kind services do not carryapollo_guidance; attached payload containsas_of,artifacts, andrationale_summary(noinjection_id,trigger, orevidence_ref— those are audit-only); attach-timeout overshoot (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS) causes omission ofapollo_guidancewith anapollo_guidance_attach_timeout_totalincrement, not a request failure; Curator pause freezes the attached state (subscribers keep receiving the as-of-pause payload); L1 and L3 make no calls to Apollo endpoints in any test fixture — attempts from non-admin tokens to admin preview endpoints return 403. component_kindtests:ServiceRegistryrecords carry acomponent_kindfield (agent|library); oracle attachesapollo_guidanceonly to MCP dispatches bound foragent-kind services;library-kind services emit observations but receive noapollo_guidancein their dispatches; Evaluator signal 2 (schema_mismatch) fires only foragent-emittedtool_outputwith an L1 intent schema on the trace, and is skipped forlibrary-emitted events; re-registering a service with a changedcomponent_kindtakes effect on the next dispatch without Apollo redeploy.ApolloGuidanceCacheSDK tests:cache.update(payload)replaces the full artifact set idempotently; each canonical accessor (get_system_prompt_additions,get_spec_fragments,get_tool_description_overrides,get_tool_pairing_hints,get_active_failure_patterns,get_service_connection_hints) returns correctly-ordered(weight desc, recency desc)results; applicability filtering narrows by intent context; empty-cache fallback returns empty lists /Nonewithout blocking; the SDK holds no transport, no HTTP client, no auth state — it is a pure in-process data structure.- Rationale + evidence tests: every attached
apollo_guidance.artifacts[*]entry carries a non-emptyrationale; everyapollo_auditrecord carries a non-emptyrationaleandevidence_ref; LLM-driven actions produce synthesized rationales, deterministic Evaluator-driven actions produce templated rationales composed fromscore_decomposition;rationaleandadmin_noteare distinct and both queryable; admin chatexplain_decision(trace_id | artifact_id | audit_id)retrieves the stored rationale and resolvesevidence_refpointers to their underlying observations / graph snapshot / score decomposition;discuss_decision(artifact_id | audit_id)opens a chat thread with the rationale pre-loaded and permits inline action tool calls that themselves are audited withactor: "admin:<username>". - Trace propagation tests: L1-minted
traceparentarrives unchanged at oracle; oracle forwards the header unchanged on downstream MCP and REST dispatches viaaxonis_core.gateway.client.extract_http_headers(); MCP context field carriestraceparentend-to-end;ApolloClientstamps both the header and envelopetrace_idon everyPOST /observations; oracle mints a replacement and logsmissing_traceparentwhen the header is absent (best-effort); oracle returns 400 whenAPOLLO_REQUIRE_TRACEPARENT=trueand the header is absent or malformed; envelopetrace_idwins when it differs from the header; a full lineage query returns every event for a singletrace_idacross all emitting layers. - Integration tests: full lineage from Layer 1
intent_schema+user_promptthrough oraclellm_turnand Layer 3tool_output/tool_errortofinal_response, with observations captured at every boundary and artifacts produced by synthesis reflecting the lineage.
Design Decisions
All 19 design decisions are now locked. Summary for reviewers:
- Q1 (observation cadence): locked — turn + tool + error + final response, no tokens
- Q2 (learning cadence): locked — LLM-driven primary synthesis, event-driven (triggers: Layer 1 request ingested, Layer 3 output ingested, admin chat turn, guidance miss, admin-initiated synthesis). Decision graphs update deterministically per-observation as the supplemental grounding layer. Graphs anchor the LLM: every LLM output is checked against graph state, proposals that diverge from recorded reality are flagged as drift and held for review. No timed or batched synthesis.
- Q3 (Apollo's LLM): locked — pluggable by design. Current default MiniMax M2.7 via
APOLLO_LLM_PROVIDER=minimax/APOLLO_LLM_MODEL=m2.7. Must be swappable with any newer/stronger model by env change alone; no MiniMax-specific assumptions in prompts or parsers. Budget tracked separately from user-facing chat. Apollo's LLM is independent of oracle's user-facing chat LLM (oracle/server/llm/tool_executor.py, configured via the existing 5-provider gateway). The two surfaces are distinct:/api/v1/chatruns oracle's chat LLM with Apollo guidance applied via the L2 in-process subscriber path (§L2 path);/api/v1/apollo/chatruns Apollo's MiniMax for admin synthesis/conversation with Apollo itself. - Q4 (ingest back-pressure): locked — HTTP POST to
/api/v1/apollo/observationswith a bounded client-side retry (default 2 attempts, exponential backoff + jitter). The server-side handler enqueues onto an in-process asyncio queue (capacityAPOLLO_INGEST_QUEUE_MAXSIZE, default 10000) and returns 202 immediately. A pool of background workers drains the queue asynchronously. Queue overflow is counted (apollo_ingest_queue_dropped_total{service}) and surfaced on/stats, never silent. Client POST timeout (APOLLO_INGEST_POST_TIMEOUT_SEC, default 30) is generous because the server-side operation is a local memory write, not WAN I/O. Lifecycle flush on subscriber shutdown ensures short-task libraries don't drop their final batch. See §Ingest Semantics. - Q5 (retention): locked — 30 days raw observations; 90 days graph snapshots tiered (7d hourly → 30d daily → 90d weekly); artifacts indefinite; audit log ≥ 90 days. Expiry is application-managed via
expires_ts+delete_by_query, matching axonis-core /rest/uds/convention (no Elastic ILM, no data streams, no rollovers). Mapping files live inoracle/apollo/templates/*_mapping.jsonalongside the pattern fromrest/uds/templates/. - Q6 (spec staging): locked — single component.oracle.apollo with phases marked inline (§Phased Rollout). Matches the structure of platform.axonis-core/02/03. The LLM and graph anchor are a closed system; splitting would force cross-spec forward references.
- Q7 (starter services): locked — Phase 1 emitters: oracle + cortex. Oracle is the sole emitter — oracle observes the
/chatenvelope (L1) and the MCP round-trip to cortex (L3) and calls the in-process observer directly. Cortex carries no Apollo emission code beyond thecomponent_kinddeclaration on itsServiceRegistryrecord. Phase 1 subscriber: cortex (L3) — consumesapollo_guidanceviaApolloGuidanceCacheper Q20. Beacon (L1) is deferred: beacon has no HTTP connection to oracle today (itsMCP_SERVER_URLdefaults to cortex direct), so guidance has no path into beacon's process until the beacon↔oracle connection is designed. Parallax (L3) is deferred from Phase 1 to a follow-on phase; its emitter and subscriber wiring follow the same pattern as cortex when it onboards. Remaining services tracked in §Integration Backlog; each onboards via whichever §Ingest Semantics path fits (in-process relay when oracle MCP-dispatches to it — the default — orApolloClientPOST when it does not). - Q8 (existing memory modules): locked — coexist. Apollo is additive; oracle's existing memory modules (
conversation.py,cross_service.py,models/memory.py) remain untouched throughout Apollo's rollout. Absorption/deprecation recommendation captured in §Deferred: Consolidation of Oracle Memory Modules, flagged as not scheduled. - Q9 (failure signals): locked — all four signals feed the Evaluator (L3 error, L3 schema mismatch, user feedback, evaluator confidence), weighted. Layer 3 performance carries an amplified penalty: signals 1 and 2 use a heavier weight tier (default 3.0 vs 1.5 for user feedback, 0.5 for confidence), trigger accelerated demotion (N=2 cycles instead of N=5), flag upstream artifacts (
IntentPattern,PromptShim,SpecFragment) for LLM review on next synthesis, and escalate toDriftEventon repeated failures. Rationale: poor L3 performance indicates workflow generation and artifacts need updating, not a slow drift. - Q10 (audit log storage): locked — Elastic
apollo_auditindex (flat, UDS-backed,expires_ts+delete_by_query, per §Retention). Default 90-day retention configurable viaAPOLLO_AUDIT_RETENTION_DAYS. Records can be markedindefinite: truefor critical admin actions (forget, pause/resume, rollback) — nullexpires_ts, never deleted. Schema captures action, actor, trigger, before/after versions, full per-signal score decomposition, upstream-artifact flags, and optional admin notes. Queryable viaGET /api/v1/apollo/auditwith rich filters. - Q11 (artifact versioning): locked — two-tier model, in place from Phase 1. Artifacts versioned per-mutation: current version in
apollo_artifacts, prior versions copied toapollo_artifact_history(noexpires_ts, indefinite). Every artifact carriesversion,prev_version_id,change_reason,actor. Rollback viaPOST /api/v1/apollo/artifacts/{id}/rollback, itself a versioned + audited event. Graphs use snapshot-based rollback instead of per-mutation versioning (hourly/daily/weekly tiers per Q5). Structural graph mutations by admin or Curator are audited. - Q12 (layer 1 obligation): locked — best-effort in Phase 1 and Phase 2 (default). Apollo accepts traces without
intent_schema; extractors fall back to prompt-inference and mark inferred nodes with reduced weight. Signal 2 (schema_mismatch) is dark for schema-less traces; L3-performance penalty still fires on signal 1.GET /apollo/statsexposesintent_schema_coverageso admins can see when coverage is high enough. Promote to required viaAPOLLO_REQUIRE_INTENT_SCHEMA=true(config flip, no code change) — natural at Phase 3 when Curator empowerment goes live. - Q13 (guidance delivery): locked — symmetric across all three LLM tiers, with the transport appropriate to each layer. L1: response-attached on
/api/v1/chat(beacon readsapollo_guidancefrom the response body and updates its local cache). L2: in-process — oracle's chat LLM (oracle/server/llm/tool_executor.py) consults a process-localApolloGuidanceCachebefore each turn; no transport needed since oracle hosts Apollo. L3: MCP-arg-attached on every outbound tool dispatch bound for anagent-kind service (the agent pops the field, updates its request-scoped cache, and dispatches). No long-lived connections, no service-token infrastructure — guidance rides the ambient auth of the envelope it is embedded in (or the in-process call for L2). Attach budget is bounded byAPOLLO_GUIDANCE_ATTACH_TIMEOUT_MS(default 10 ms); overshoot omits the field/call without failing the request. Admin inspection preserved viaGET /api/v1/apollo/guidance?scope=...(admin-only). - Q14 (L3 taxonomy): locked — L3 is not homogeneous. Every L3 service declares
component_kindon itsServiceRegistryrecord:agent(has an LLM; subscribes to injections; subject toschema_mismatchsignal) orlibrary(no LLM; emits observations only; not subscribed; not schema-evaluated). Apollo filters subscriber enumeration and Evaluator signal-2 application by this field.ServiceRegistryis the single source of truth; Apollo introduces no parallel registry. Misclassification is recoverable by re-registering the service — no Apollo code change or redeploy needed. - Q15 (application contract): locked —
ApolloGuidanceCacheis a pure-stdlib in-process data structure with anupdate(apollo_guidance_block)sink (called by the subscriber's request handler when an inbound envelope carriesapollo_guidance) and a fixed set of canonical accessors (get_system_prompt_additions,get_spec_fragments,get_tool_description_overrides,get_tool_pairing_hints,get_active_failure_patterns,get_service_connection_hints). Artifacts are ordered by(weight desc, recency desc); merge policy past ordering is the agent's choice. No transport, no HTTP client, no auth state inside the module — the file imports onlytypingand__future__. Distribution model: the canonical reference lives in axonis-core (axonis/apollo/guidance_cache.py). Subscribers SHOULD import directly from axonis-core (from axonis.apollo.guidance_cache import ApolloGuidanceCache) — this is oracle's and cortex's path. Vendoring is allowed but not preferred: if a future subscriber's dependency posture rules out taking on axonis-core (a different language runtime, a strict-isolation deployment, etc.), it MAY vendor the module under its own namespace; vendored copies must preserve the canonical contract verbatim and the subscriber owns drift detection. Cortex briefly vendored during M14 development, then unified on the canonical import once axonis-core was added to itspyproject.toml. Standardizes cross-service application by giving every L1/L3 agent the same SDK shape; only the import path varies. - Q16 (rationale + audit conversation): locked — every Curator action writes an
apollo_auditrecord with arationale(LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions) andevidence_ref(observations, graph snapshot id, score decomposition, related drift events). The per-artifactrationalealso travels with each artifact on response/dispatch-attached payloads so agents may log it when applying;evidence_refstays in the audit record only to keep the on-wire payload small. Admin chat exposesexplain_decision(trace_id | artifact_id)anddiscuss_decision(artifact_id)so admins can conversationally review Apollo's reasoning and act on findings inline.rationaleis why Apollo acted;admin_note(optional) is why the admin acted. Both are preserved in the audit index. - Q17 (trace propagation): locked — W3C Trace Context (
traceparentheader) end-to-end across L1 → L2 → L3. L1 mints; oracle forwards unchanged as an HTTP header on every downstream MCP / REST dispatch via an additive extension toaxonis_core.gateway.client.extract_http_headers().ApolloClientstamps both the header and the envelopetrace_idon every observation POST (envelope wins on conflict). Greenfield: no existing tracing header in axonis-core or platform.axonis-core/02/03 is displaced.APOLLO_REQUIRE_TRACEPARENT=falsethrough Phases 1–2 (oracle mints on absence, logsmissing_traceparent); flip totruein Phase 3 alongsideAPOLLO_REQUIRE_INTENT_SCHEMA. No OpenTelemetry SDK dependency introduced; a future OTel integration consumes the same header without change. Realizes the OpenTelemetry aspiration noted in component.oracle.gateway. - Q18 (L1 + L3 emission path): locked — L1 → Oracle → Apollo and L3 → Oracle → Apollo. Neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter for every Phase-1 event type:
intent_schema,user_prompt, anduser_feedbackare emitted by oracle in-process when oracle receives the corresponding signal from L1 via its existing API surface (/chatbody, feedback submission);tool_outputandtool_errorare emitted by oracle in-process when oracle's MCP dispatch to an L3 service returns. L1 and L3 hold no Apollo endpoint knowledge and no Apollo credentials; they do not participate in Apollo ingest on any in-production path.ApolloClientis used only by admin replay/seed and by future out-of-process emitters outside oracle's MCP dispatch reach (§Ingest Semantics → Secondary path); Phase-1 emitters calloracle.apollo.observer.ingestin-process. Rationale: preserves component.oracle.gateway invariant 1 (oracle is the only externally visible service), keeps the L1 and L3 API surfaces narrow, and avoids requiring either layer to know about Apollo at all. - Q19 (ingest auth): locked — the primary in-process path has no network-auth boundary: oracle authenticates the user at the
/chat/ MCP ingress viaOAuthMiddlewareand then calls the Apollo observer as a direct in-process function call, so no additional token is required on emission. The secondary HTTP path (admin replay + out-of-process emitters) authenticates via a Bearer token — an admin token for replay/seed, or a user-forwarded token for an out-of-process emitter (the same token the service would have received on its inbound request). No Apollo-specific credential is issued; no new service-token mechanism is introduced. Background ingest without a user context (batch workers, standalone pipelines) is deferred pending component.oracle.gateway's Keycloak client-credentials work. See §Authentication & Authorization. - Q20 (subscriber consumption contract): locked — every subscriber that runs an LLM consumes
apollo_guidanceby instantiating anApolloGuidanceCache(axonis-core), callingcache.update(payload)on every inbound envelope that carries the field, and reading the canonical accessors before its next LLM call. Mandatory accessors per LLM call:get_system_prompt_additions(intent_context)(each returned PromptShim'scontent.textis appended to the system prompt) andget_tool_description_overrides(tool_name)(applied per tool while rendering the tool catalog). Optional accessors (consumed where the agent's domain warrants):get_spec_fragments,get_tool_pairing_hints,get_active_failure_patterns,get_service_connection_hints. Cache lifetime is layer-dependent: session-scoped at L1 (one cache per chat session, kept across turns, refreshed each time oracle's/chatreturns a new payload) and request-scoped at L3 (one cache per inbound MCP tool call, populated fromarguments.apollo_guidance, discarded after the tool returns). Failure posture: missing or malformedapollo_guidanceis a no-op —cache.update(None)is legal, accessors return empty lists /None, and the LLM call proceeds with no guidance applied. Phase 1 subscriber: cortex (L3). Beacon's L1 wiring is deferred — beacon currently has no HTTP connection to oracle (itsMCP_SERVER_URLdefaults to cortex direct, not oracle), so there is no path through whichapollo_guidancecan reach beacon's process today. A separate spec decision must define how beacon becomes an oracle client before the L1 contract can be exercised. Parallax's L3 subscriber wiring is deferred to a later phase (same pattern as cortex when it lands). Rationale: pins the consumption contract that Q15 left open ("merge policy past ordering is the agent's choice"), so subscriber tests can prove guidance actually changes a downstream LLM call rather than being attached and discarded.
Implementation Plan (milestone history)
Companion to: component.oracle.apollo-APOLLO.md (design)
Scope: a step-by-step build order for the Apollo package inside oracle. Each milestone is a small, reversible, independently shippable slice. Later milestones layer onto earlier ones; no milestone depends on work that appears later.
Audience: the engineer(s) implementing Apollo. This document answers "what do I build first, and in what order."
Principles
- Ship in small reversible slices. Every milestone ends with a mergeable PR that leaves oracle functional whether or not later milestones land.
- Stand up the skeleton before the brain. Observation intake and deterministic grounding come before LLM synthesis; the system must be able to record reality before it tries to reason about reality.
- Oracle stays working throughout. Apollo is additive. Do not modify oracle's existing routes, middleware, memory modules, or chat behavior except where explicitly scoped (
ChatResponseextension, MCP dispatch argument injection, in-process observer calls). - Follow established axonis conventions without exception. Pydantic for envelopes,
Memory(UDS)for Elastic-backed classes,axonis_core.elastic.Elasticfor CRUD,OAuthMiddlewarefor auth, HTTP + Bearer for transport, JSON templates undertemplates/for index mappings. If a new pattern is tempting, stop and find the existing one first. - Every milestone ends with tests. The design spec's §Test Expectations is the canonical test list; implement the subset relevant to each milestone as you go. Do not defer tests to the end.
- Observability from day one.
/statsendpoint exists from Milestone 1; every metric named in the design spec has a zero-valued counter registered at startup, so dashboards can be built before the code that drives them.
Milestone map
| # | Milestone | Why it comes here |
|---|---|---|
| 0 | Package scaffolding + settings + dependencies | Nothing else compiles without this. |
| 1 | Observation intake (HTTP POST + async queue + Elastic writes) | Apollo needs to receive and persist raw observations before anything downstream can work. |
| 2 | Deterministic graph updates (no LLM) | The grounding layer — lets Apollo record reality without interpreting it. Drift check in later milestones depends on this. |
| 3 | Guidance attach plumbing (empty artifact set OK) | Wire apollo_guidance into /chat responses and MCP dispatches with an always-empty artifact set; proves end-to-end wiring without needing any learned artifacts. |
| 4 | Subscriber SDK (ApolloGuidanceCache in axonis-core) |
Agents need a canonical way to consume apollo_guidance. Ships independently of Apollo having anything to say. |
| 5 | Phase-1 emitter integration (oracle as sole observer for L1, L2, and L3 — all emission in-process) | Enough real traffic to populate graphs. Cortex carries no Apollo emission code; oracle observes the MCP round-trip to it and emits tool_output / tool_error on its behalf. Parallax onboards in a follow-on phase. |
| 6 | Trace propagation (W3C traceparent end-to-end) |
Lineage stitching. Can land anytime after Milestone 1, but best coupled with emitter work so emitters adopt it on initial integration. |
| 7 | Admin inspection surface (CRUD endpoints + read-only chat) | Operator visibility before autonomous behavior. Admin must be able to see Apollo's state before Apollo is allowed to change its state. |
| 8 | LLM synthesis engine + graph-anchor drift check | Introduces Apollo's LLM loop. Produces artifact proposals; drift-check gate is in place from the first synthesis commit. |
| 9 | Curator commits + versioning + audit | Turns proposals into committed state. Hard invariants enforced. Every mutation versioned and audited. |
| 10 | Evaluator scoring + L3-performance amplification | Closes the feedback loop — artifacts that stop correlating with good outcomes decay. |
| 11 | Admin chat empowerment (action tools + rationale discussion) | Full admin override: explain/discuss/rollback/forget. Requires Milestones 7–9 as substrate. |
| 12 | Autonomous Curator + production drift prevention | Flip Curator to autonomous. Drift thresholds tuned. Pause/resume wired. Rollback endpoints live. |
| 13 | Maintenance + /stats polish + degraded-emitter reporting |
Hourly maintenance job, snapshot coarsening, full metric surface. The ops-readiness milestone. |
| 14 | Subscriber LLM consumption (cortex L3) | Closes the L3 consumption side of the Injection Channel. M3–M5 attached apollo_guidance to MCP dispatches; M14 wires the SDK reads into cortex's tool path so guidance changes downstream LLM prompts at runtime. Locks Q20's contract for L3. Beacon (L1) is deferred — beacon has no HTTP connection to oracle today, so the L1 path is gated on a separate beacon↔oracle wiring decision. |
| 15 | Subscriber LLM consumption (oracle L2) | Closes the L2 consumption side. Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop (oracle/server/llm/tool_executor.py). M15 wires Apollo's in-process for_l2(...) attach plus an ApolloGuidanceCache populated each turn so oracle's chat LLM consumes guidance the same way cortex does — no transport, since oracle hosts Apollo. |
Milestones 0–6 correspond roughly to the design spec's Phase 1 ("Observe and ground"). Milestones 7–10 correspond to Phase 2 ("Synthesize and advise"). M14 + M15 retroactively complete Phase 2's Injection Channel commitment ("guidance reaches every LLM") on the consumption side; oracle's attach side has been live since M3. Milestones 11–13 correspond to Phase 3 ("Empower and maintain").
Milestone 0 — Package scaffolding, settings, dependencies
Purpose. Create the directory layout specified in §Package Structure of the design spec, wire Apollo into oracle's Starlette app as a mounted route prefix, and land the dependency additions.
Scope.
- Create oracle/apollo/ tree per design §Package Structure (empty module files where needed).
- Add oracle/apollo/settings.py exposing every APOLLO_* env var listed across the design spec, each with its default. All other code reads config from here, never from os.environ directly.
- Extend oracle/pyproject.toml with sentence-transformers>=3.0.0 and numpy>=1.24. Confirm no other new dependencies.
- Mount /api/v1/apollo/* route group in oracle/server/__main__.py. For Milestone 0, only mount GET /api/v1/apollo/stats returning {"status": "bootstrapping"} to prove wiring.
Files created.
- oracle/apollo/__init__.py (stub)
- oracle/apollo/settings.py
- oracle/apollo/observer/__init__.py, events.py, ingest.py (stubs)
- oracle/apollo/memory/__init__.py, store.py (stubs)
- oracle/apollo/learner/__init__.py (stub)
- oracle/apollo/guidance/__init__.py, api.py, attacher.py, selectors.py (stubs)
- oracle/apollo/curator/__init__.py (stub)
- oracle/apollo/evaluator/__init__.py (stub)
- oracle/apollo/chat/__init__.py (stub)
- oracle/apollo/artifacts.py (stub)
- oracle/apollo/llm.py (stub)
- oracle/apollo/templates/ (empty directory — mappings added Milestone 1)
Files modified.
- oracle/pyproject.toml (dependencies)
- oracle/server/__main__.py (mount route group)
Acceptance.
- uv run python -m server starts without error.
- GET /api/v1/apollo/stats returns 200 with a stub JSON body.
- No existing oracle test fails.
Rollback. Delete the oracle/apollo/ directory and revert the two modified files.
Milestone 1 — Observation intake
Purpose. Stand up observation intake: the in-process oracle.apollo.observer.ingest.ingest(...) entry point (the primary path that oracle will use from Milestone 5 onward to emit on behalf of L1, L2, and L3), the POST /api/v1/apollo/observations endpoint (the secondary path for admin replay/seed and any future out-of-process emitter), the shared in-process asyncio queue, the background-worker pool, and Elastic writes to apollo_observations. At end of this milestone, Apollo can receive observations via either path and persist them — nothing more.
Scope.
- oracle/apollo/observer/events.py — Pydantic models for every event type in design §Observation Model (intent_schema, user_prompt, llm_turn, tool_output, tool_error, final_response, user_feedback). All inherit from a common envelope.
- oracle/apollo/observer/ingest.py:
- async def ingest(envelope) — in-process entry point; validates and enqueues. This is what oracle calls for every Phase-1 event it observes — L1-origin (intent_schema, user_prompt, user_feedback), L2-origin (llm_turn, final_response), and L3-origin (tool_output, tool_error) observed via oracle's MCP round-trip (oracle is Apollo's sole emitter per M5 + §Invariants 14).
- async def _drain_worker() — background coroutine that reads from the queue and writes to apollo_observations via apollo.memory.store. Worker pool size = APOLLO_INGEST_WORKER_CONCURRENCY.
- At-least-once dedup on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC.
- oracle/apollo/guidance/api.py — add POST /observations route. Auth: the existing OAuthMiddleware + require_auth dependency. Request body is {"observations": [envelope, ...]}; response 202 with {"accepted": N}.
- oracle/apollo/memory/store.py — ApolloObservation(Memory(UDS)) class backed by the apollo_observations index.
- oracle/apollo/templates/apollo_observations_mapping.json — index template matching the §Index mappings convention (UDS block, create_ts, update_ts, schema_version, expires_ts, embedding field).
- Register zero-valued counters on startup: apollo_ingest_accepted_total, apollo_ingest_queue_dropped_total, apollo_ingest_post_failure_total, apollo_ingest_queue_depth.
ApolloClient addition (axonis-core).
- New file axonis_core/gateway/apollo_client.py — thin httpx.AsyncClient wrapper with emit(envelope) method. Client-side batching (APOLLO_INGEST_BATCH_SIZE, APOLLO_INGEST_FLUSH_INTERVAL_MS), bounded retries, lifecycle flush() on atexit. No Redis, no ML deps, no new top-level dependency — uses httpx which axonis-core already has. This client targets the secondary ingest path only — admin replay/seed and future out-of-process emitters. Phase-1 emitters (oracle + cortex) never import it; oracle emits in-process (see Milestone 5).
Acceptance.
- Unit: envelope validation catches malformed events; dedup window suppresses duplicates.
- Integration: a POST with 50 observations returns 202 in <10 ms; all 50 land in apollo_observations; queue-full case returns 202 with apollo_ingest_queue_dropped_total incremented; worker-crash case puts envelope on retry then to APOLLO_INGEST_DEAD_LETTER_PATH if set.
- /stats exposes queue depth and per-service apollo_ingest_last_ingest_ts — a single timestamp covering both oracle's in-process enqueues (primary path for Phase-1 emitters) and secondary-path POSTs.
Rollback. Revert the route addition and the ApolloClient file. Observations stop being accepted; oracle unaffected.
Milestone 2 — Deterministic graph updates
Purpose. Wire the five Decision Graphs (§Learner → Decision Graphs) so that every ingested observation produces graph mutations deterministically, with no LLM call. This is the grounding layer.
Scope.
- oracle/apollo/learner/extractors.py — rule-based extractors that map observations to (graph_id, nodes_touched, edges_touched, outcome_class). Five specialized extractor paths, one per graph.
- oracle/apollo/learner/graphs.py — DecisionGraph class wrapping the two Elastic indices (apollo_graph_nodes, apollo_graph_edges). Upsert operations with EWMA weight updates (short-window via APOLLO_GRAPH_EWMA_SHORT, long-window via APOLLO_GRAPH_EWMA_LONG).
- oracle/apollo/learner/snapshots.py — periodic snapshot task (default hourly, APOLLO_GRAPH_SNAPSHOT_INTERVAL) writes current state to apollo_graph_snapshots.
- oracle/apollo/learner/trajectory.py — EWMA-based projection of near-future graph state.
- Extend oracle/apollo/observer/ingest.py background worker: after Elastic write of an observation, call into extractors and apply graph mutations.
- In-memory mirror of active graphs for hot-path reads; rebuilt from Elastic on startup.
- Templates: apollo_graph_nodes_mapping.json, apollo_graph_edges_mapping.json, apollo_graph_snapshots_mapping.json.
Acceptance.
- Unit: deterministic extractors are idempotent; running the same observation twice produces identical graph state. EWMA math matches expected values for a known sequence of observations.
- Integration: posting 1000 synthetic observations populates nodes/edges; hourly snapshot task runs and writes to apollo_graph_snapshots; restarting Apollo rebuilds the in-memory mirror from Elastic.
- No LLM call happens on this path.
Rollback. Disable the extractor invocation in the background worker (feature flag on settings); graphs stop receiving updates but existing state is preserved.
Milestone 3 — Guidance attach plumbing (empty artifact set)
Purpose. Wire apollo_guidance into both delivery paths (§Injection Channel) end-to-end, returning an empty-but-well-formed payload. No artifacts yet; this milestone proves the attach mechanism without requiring the synthesis engine.
The L1 attach side wires into oracle/server/api/routes.py chat() handler (POST /api/v1/chat) — oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop. This is distinct from POST /api/v1/apollo/chat (Apollo's admin chat at oracle/apollo/chat/server.py:79), which runs Apollo's separate MiniMax LLM and is not the L1 surface.
Scope.
- oracle/apollo/guidance/attacher.py:
- def for_l1(user, intent_context) -> dict | None — in-process; returns {"as_of": ..., "artifacts": [], "rationale_summary": ""} or None based on settings. Bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS.
- def for_l3_agent(service_name, intent_context, tool_name) -> dict | None — same shape.
- oracle/apollo/artifacts.py — Pydantic models for every artifact type in §Artifact types, used in the payload schema even though no artifacts exist yet.
- Oracle ChatResponse extension in oracle/server/api/routes.py: add apollo_guidance: dict | None = Field(default=None). /chat handler calls attacher.for_l1(...) before constructing the response.
- Oracle MCP dispatcher extension in oracle/server/mcp/server.py and oracle/server/llm/tool_executor.py: before serializing an outbound MCP tool call to an agent-kind L3 service, call attacher.for_l3_agent(...) and inject the result into the tool's arguments dict under apollo_guidance (same pattern as the existing llm_spec injection). Library-kind services are excluded from this injection.
- component_kind field on ServiceRegistry (oracle/server/mcp/registry.py): add an agent | library field to the ToolInfo / registry record, sourced from the service's GET /service-info response or /register POST body. Default to agent if absent (safe default — unknown services treated as agents; oracle will attach guidance they ignore).
- oracle/apollo/guidance/selectors.py — match_artifacts(intent_context, active_set) -> list[Artifact]. Runs empty today; returns empty list because active_set is empty. Implementation exists so Milestone 8 only needs to feed it a non-empty set.
- Timeout handling: if attacher.for_l1 / for_l3_agent overshoots the budget, return None; oracle proceeds without attaching. Counter apollo_guidance_attach_timeout_total increments.
Acceptance.
- Integration: /chat response body includes apollo_guidance: {"as_of": ..., "artifacts": [], "rationale_summary": ""} when a caller is identified; null when Apollo is disabled via settings.
- Integration: MCP tool-call dispatch to an agent-kind service carries apollo_guidance inside arguments; dispatch to a library-kind service does not.
- Integration: with APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS=1, field is omitted; request still succeeds.
- No regression in existing oracle tests.
Rollback. Gate the attacher calls behind APOLLO_GUIDANCE_ATTACH_ENABLED, default false until all of Milestone 3 is merged.
Milestone 4 — Subscriber SDK (ApolloGuidanceCache)
Purpose. Ship the canonical in-process cache that every L1 and L3 agent uses to consume attached apollo_guidance payloads.
Scope.
- New module inside oracle at apollo/sdk/guidance_cache.py. Axonis-core is not touched at this milestone. The SDK is published under apollo.sdk as the canonical source; the file later relocates into axonis_core/apollo/ (the canonical reference per Q15). Subscribers SHOULD import it directly from axonis-core (oracle's path; cortex's path in M14). Vendoring under a subscriber's own namespace remains documented in Q15 as a fallback for any future agent whose dep posture rules out taking on axonis-core, but Phase 1 subscribers all import canonical.
- API: update(apollo_guidance_block) sink + six canonical accessors (design §Subscriber SDK table).
- Ordering: (weight desc, recency desc) when multiple artifacts match.
- Applicability filtering inside the cache.
- Empty-cache fallback: accessors return empty lists / None without blocking.
- Pure Python data structures; no HTTP client, no transport, no ML dependency. No Apollo imports — the SDK is deliberately decoupled from apollo.artifacts so the M5 file move lands cleanly.
Acceptance.
- Unit: cache.update() replaces the full artifact set idempotently; update(None) is a no-op (preserves prior cache); accessors filter by intent context; empty cache returns empty results; ordering is stable.
- No new dependency added to axonis-core.
Rollback. Axonis-core accepts a new file; removing it breaks no existing imports (no one depends on it until Milestone 5).
Milestone 5 — Phase-1 emitter integration
Purpose. Wire oracle as the sole observer for Phase-1. Oracle emits on behalf of L1, L2, and L3 via in-process oracle.apollo.observer.ingest calls — no HTTP, no cross-process emission from cortex. Oracle attaches apollo_guidance on every outbound /chat response (L1) and every MCP dispatch bound for an agent-kind L3 service (cortex). The L3 consumption of that attached guidance is wired in M14 (cortex); the L1 consumption side is deferred until a beacon↔oracle connection is designed (see M14 §Out of scope). Through M5–M13 the attach side is live but no subscriber yet reads the field. The observation side is oracle-only.
This milestone operationalizes component.oracle.apollo §Invariants 14 and §Ingest Semantics: neither L1 nor L3 addresses Apollo directly. Both talk to oracle; oracle talks to Apollo.
Scope.
- Oracle — L1 and L2 emissions (unchanged from prior plan).
- In oracle/server/api/routes.py: emit intent_schema (from /chat body if present), user_prompt (from /chat body), final_response (on serialize), user_feedback (on a new feedback endpoint or extended existing one).
- In oracle/server/llm/router.py / tool_executor.py: emit llm_turn on every LLM request/response cycle inside oracle.
- Oracle — L3 emissions (new; replaces per-service emission).
- Add emit_tool_output and emit_tool_error helpers to oracle/apollo/hooks/chat.py, matching the existing emit_user_prompt / emit_llm_turn / emit_final_response shape. Helpers take the caller identity, trace_id, conversation_id, service name, tool name, latency, and the tool input/output (or error) — and call oracle.apollo.observer.ingest.ingest(...) in-process.
- In oracle/server/llm/tool_executor.py: after _call_backend_tool returns (success or raised error), call emit_tool_output or emit_tool_error with the observed result. This is the emission point for every tool dispatch oracle's LLM-use loop makes.
- In oracle/server/mcp/server.py (MCP proxy path): after the proxied dispatch completes, emit the same way — covers MCP tool calls that arrive from external MCP clients and are forwarded to L3 services through oracle.
- All emissions flow through apollo.observer.ingest.ingest(...) — no HTTP, no ApolloClient.
- Cortex — no Apollo emission code at this milestone.
- Ensure cortex's GET /service-info response (or equivalent registration payload) declares "component_kind": "agent" so oracle attaches apollo_guidance on MCP dispatches to it and so Evaluator signal 2 applies.
- At M5, cortex receives apollo_guidance in MCP arguments but does not yet read it — its tool signatures don't declare the field, so FastMCP silently strips it before invocation. Subscriber consumption (cache.update + accessor reads) is wired in M14. Through M5–M13, attach is live and consumption is dark.
- Parallax — deferred from Phase 1. Same observer + subscriber pattern as cortex when it onboards.
- Secondary path (POST /api/v1/apollo/observations). Remains mounted from Milestone 1 for admin replay/seed and for future out-of-process emitters; not exercised by Phase-1 services.
Acceptance.
- Integration: a user /chat request produces a full lineage of observations under a single trace_id — all emitted by oracle in-process. The lineage includes user_prompt, oracle llm_turn, tool_output / tool_error for every MCP dispatch oracle made to cortex, and final_response.
- Oracle attaches apollo_guidance to its /chat response and to every outbound MCP dispatch bound for cortex. Subscriber consumption is dark at M5; M14 closes that loop.
- Cortex's source tree contains zero imports of ApolloClient or ApolloIntegration and zero lifespan wiring for Apollo emission. The only Apollo-facing change in cortex at M5 is the component_kind field in its registration payload.
- Full lineage query (Milestone 7) returns every event for the trace.
Rollback. The in-process emission helpers are guarded by the existing APOLLO_EMITTER_ENABLED flag in oracle/apollo/settings.py; flipping it to false disables all oracle-side emission without touching oracle's request path. The secondary HTTP path remains mounted regardless.
Milestone 6 — Trace propagation (W3C traceparent)
Purpose. End-to-end trace stitching via the W3C traceparent header. Every observation oracle emits on behalf of L1, L2, or L3 for a single /chat request carries the same trace_id; every outbound call oracle makes to L3 forwards the same traceparent header so L3's own logs can correlate.
This milestone operationalizes component.oracle.apollo §Trace Propagation under the oracle-sole-observer rule from M5. The wire path is L1 → Oracle → L3 — Apollo is never on the wire. Oracle's in-process emitters stamp the envelope trace_id directly from the ambient value. Outbound calls to L3 carry the header only for L3's own logging correlation; L3 never forwards it to Apollo because L3 never addresses Apollo (§Invariants 14).
Scope.
- Canonical trace module (axonis-core, strictly additive). New axonis/core/trace.py:
- TraceContext + parse_traceparent / format_traceparent / mint_traceparent — W3C-conformant parser with strict validation (reserved all-zero trace-ids rejected, unknown version byte rejected).
- Ambient ContextVar holding the raw 4-segment string; set_current_traceparent, get_current_traceparent, current_trace_id() accessors.
- No new dependency — pure Python, ~20 lines of parsing.
- Ingress: oracle mints on receipt. New oracle/server/middleware/trace.py::TraceparentMiddleware, installed outside OAuthMiddleware in oracle/server/__main__.py:
- Reads APOLLO_TRACE_HEADER (default traceparent) on every non-skip request; skip paths are /health and /service-info.
- Parses via the canonical module. On valid: installs on the ContextVar unchanged (oracle never re-mints mid-request). On missing: mints a replacement + increments apollo_missing_traceparent_total. On malformed: same behavior + increments apollo_malformed_traceparent_total.
- APOLLO_REQUIRE_TRACEPARENT=true (flipped in Phase 3) short-circuits both failure paths with a 400 response.
- Propagation helper. Extend axonis_core.gateway.client.extract_http_headers() to forward traceparent alongside Authorization. Pulls from the inbound headers when provided; otherwise falls back to the ambient ContextVar. This is the single source of truth for any gateway client — MCPClient, RestClient, and the federation layer inherit traceparent forwarding for free without being touched directly.
- Outbound: oracle's direct httpx paths. Oracle's L3-dispatch paths use httpx.AsyncClient directly rather than the gateway clients, so they inject traceparent explicitly:
- oracle/server/llm/tool_executor.py::_call_backend_tool — adds the ambient traceparent to outbound headers on every backend MCP POST.
- oracle/server/mcp/server.py MCP proxy — same.
- In-process emitters carry trace_id directly. apollo/hooks/chat.py helpers already accept a trace_id parameter; oracle/server/api/routes.py::/chat now sources it from current_trace_id() (the ambient value installed by TraceparentMiddleware) and threads it into every helper call. The MCP proxy path does the same, falling back to a locally-minted id only when invoked outside HTTP ingress (e.g., direct programmatic tests). No header is on the primary path — emissions are in-process function calls.
- Secondary-path stamping. ApolloClient.emit() in axonis_core/core/apollo/client.py stamps traceparent from the ambient ContextVar on every POST; the envelope trace_id (caller-set) wins on conflict per §Envelope mapping. Phase-1 emitters (oracle + cortex) never use this client — admin replay and out-of-process emitters do.
- Config. APOLLO_TRACE_HEADER (default traceparent) and APOLLO_REQUIRE_TRACEPARENT (default false through Phases 1–2) already exist in oracle/apollo/settings.py; no new env var surface.
- L1 expectation. L1 (beacon, browser clients, any direct /chat caller) is expected to mint traceparent on every new request. Document this contract; do not enforce in best-effort mode (oracle mints on absence and serves the request).
Acceptance.
- Integration: /chat → oracle → cortex → oracle. Every observation oracle emits — L1 user_prompt, L2 llm_turn, L3 tool_output / tool_error, L2 final_response — carries the same trace_id, which is the one installed by TraceparentMiddleware. Lineage query stitches them.
- Outbound MCP dispatches from oracle to L3 include a traceparent header with the same trace_id — L3's own logs can correlate against oracle's observations even though L3 never talks to Apollo.
- TraceparentMiddleware mints a replacement and increments apollo_missing_traceparent_total when the inbound header is absent; increments apollo_malformed_traceparent_total when the header is present but malformed (best-effort mode). Returns 400 on either condition in required mode.
- ApolloClient.emit() stamps the ambient traceparent on secondary-path POSTs; envelope trace_id wins on header-vs-envelope conflict.
Rollback. Revert the TraceparentMiddleware installation in server/__main__.py, the extract_http_headers extension, the ApolloClient traceparent stamping, and the outbound header injections in tool_executor.py / mcp/server.py. axonis/core/trace.py stays (no import breaks if nothing calls it). Lineage stops stitching across services but every other pathway — ingest, emit helpers, guidance attachment — works unchanged.
Milestone 7 — Admin inspection surface
Purpose. Operators can see everything Apollo has captured before Apollo is allowed to mutate anything autonomously. Every endpoint in this milestone is admin-only — neither L1 nor L3 ever calls them (§Invariants 14). The only production-path traffic Apollo serves is the response-attached guidance payload and the secondary-path POST /observations; everything under /api/v1/apollo/memories|artifacts|guidance|audit|stats|chat in this milestone is admin inspection.
Scope.
- REST endpoints (all admin-only):
- GET /api/v1/apollo/memories, GET /memories/{id}, POST /memories (seed), PATCH /memories/{id}, DELETE /memories/{id}
- GET /api/v1/apollo/artifacts — returns empty today; populated Milestone 8+
- GET /api/v1/apollo/guidance?scope=l1 and ?scope=l3:<service> — preview the currently-attachable set (empty today)
- GET /api/v1/apollo/subscribers — list currently-connected admin SSE debug streams
- GET /api/v1/apollo/guidance/stream?scope=<scope> — admin-only SSE debug feed (use cortex's event_stream.py as reference)
- GET /api/v1/apollo/audit — returns empty today; populated Milestone 9
- GET /api/v1/apollo/stats — populated with all counters registered so far
- oracle/apollo/chat/server.py — read-only admin chat (can inspect observations / lineage / graphs; cannot yet act). Uses Apollo's LLM (see Milestone 8) — or, until Milestone 8, a stub that only runs list_memories and get_memory tools.
- oracle/apollo/chat/tools.py — read-only tool set for this milestone.
- Auth: every endpoint gated by role admin via oracle's guardrails.
Acceptance.
- Admin can query observations by trace_id, inspect graph state, and read /stats.
- Non-admin callers receive 403 on every admin endpoint.
- Admin SSE debug feed emits per Curator commit (no commits yet — but the wiring is proven with a synthetic emit).
Rollback. Remove the endpoint registrations; inspection is lost but no data is affected.
Milestone 8 — LLM synthesis + graph-anchor drift check
Purpose. Apollo's LLM runs. It proposes artifact mutations. The graph-anchor drift check gates every proposal. Nothing is committed autonomously yet — proposals go to a pending-review queue.
Scope.
- oracle/apollo/llm.py — Apollo's LLM client. Three providers ship in M8:
- openai (production default) — the existing openai SDK pointed at any OpenAI-compatible endpoint via APOLLO_LLM_BASE_URL (e.g., MiniMax's hosted endpoint). No new dependency.
- minimax-local (scaffolded; dev / air-gapped) — lazy HuggingFace transformers load of the stock MiniMax checkpoint using the canonical model-card signature:
python
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
Weights are resolved from the standard HF cache at ${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/. See design spec §Apollo's LLM → Local MiniMax via HuggingFace for the full contract (pre-pull pattern, disk/GPU expectations, deferred production knobs like APOLLO_LLM_LOCAL_MODEL_PATH, device mapping, thread-pool offload, streaming). M8 ships the scaffold only — the deferred knobs are explicitly out of scope for this milestone and tracked under §Out of scope for this plan.
- stub (tests + bootstrapping) — canned responses registered via LLMClient.install_stub_response(...); no network, no GPU, no model dep. Used throughout the test suite to drive synthesis deterministically.
The minimax alias is accepted as a synonym for openai (with MiniMax's endpoint as the assumed base URL) so APOLLO_LLM_PROVIDER=minimax stays meaningful; the in-code dispatch funnels minimax and openai through the same provider path.
- oracle/apollo/learner/synthesis.py — event-driven dispatcher. Triggers fire off observations oracle has already ingested into the in-process queue (component.oracle.apollo §Invariants 14: oracle is the sole Phase-1 emitter, so "L1 observation ingested" means "oracle emitted an L1-origin envelope on L1's behalf"). Triggers (design §Synthesis triggers):
- L1-origin observation ingested (intent_schema / user_prompt) — emitted by oracle from /chat request body
- L3-origin observation ingested (tool_output / tool_error / final_response) — emitted by oracle after observing the MCP round-trip (or the /chat serialization point for final_response)
- Admin chat turn
- Admin-initiated synthesis via POST /api/v1/apollo/learn
- Concurrency bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT; trace_id coalescing.
- oracle/apollo/learner/prompts.py — prompt templates for synthesis (intent classification, failure-pattern extraction, etc.).
- oracle/apollo/learner/drift.py — graph-anchor drift check. Every LLM output passes through four checks (proposed-pattern-vs-edges, intent-classification-vs-clusters, weight swings, trajectory coherence). Divergent proposals produce a DriftEvent and enter the pending-review queue; consistent proposals are approved for commit (commit itself happens in Milestone 9).
- Write artifact proposals to a new apollo_artifact_proposals transient store — or directly to apollo_artifacts with a status: "pending_admin_review" | "approved" field. Pick one; the design spec does not mandate.
Acceptance.
- Integration: a synthetic sequence of L3 tool_error observations triggers an LLM call; the LLM proposes a FailurePattern; graph-anchor check validates against the outcome_graph; if consistent, proposal is marked approved and ready for commit; if not, produces a DriftEvent.
- Unit: drift-check unit tests cover each of the four checks.
- LLM swap test: setting APOLLO_LLM_PROVIDER=openai and APOLLO_LLM_MODEL=gpt-4 changes provider with no code change.
Rollback. Disable the synthesis trigger entries; deterministic graph updates continue, no proposals generated.
Milestone 9 — Curator commits + versioning + audit
Purpose. Approved proposals become committed artifacts. Every mutation is versioned and audited. Hard invariants enforced.
Scope.
- oracle/apollo/curator/actions.py — promote, demote, forget, edit, compact. Each wraps the mutation + history-write + audit-write as one atomic unit.
- oracle/apollo/curator/policy.py — hard invariants (§Curator → Disallowed actions). Every action first passes through the policy gate.
- oracle/apollo/curator/audit.py — writes apollo_audit records. Schema per design §Audit log: action, actor, trigger, artifact_id, before_version_id, after_version_id, evaluator_score, score_decomposition, upstream_artifact_ids, rationale (required, non-empty), evidence_ref, indefinite, admin_note.
- Templates: apollo_artifacts_mapping.json, apollo_artifact_history_mapping.json, apollo_audit_mapping.json.
- Rollback endpoint: POST /api/v1/apollo/artifacts/{id}/rollback with target version. Rollback is itself a versioned + audited event.
- Admin-only write endpoints: POST /artifacts/{id}/promote, POST /artifacts/{id}/demote, DELETE /artifacts/{id}.
- Rationale generation: LLM-synthesized for LLM-driven actions; templated from score decomposition for deterministic Evaluator actions (see Milestone 10).
- At this milestone, autonomous Curator is disabled. Every mutation requires an admin trigger via the chat or admin endpoints. (Milestone 12 flips this to autonomous.)
Acceptance.
- Integration: admin promotes an artifact; the prior version lands in apollo_artifact_history; current version in apollo_artifacts; an audit record with non-empty rationale lands in apollo_audit.
- Integration: admin rolls back; rollback writes a new version whose prev_version_id points at the post-rollback state; rollback itself is audited.
- Unit: every disallowed action raises CuratorPolicyViolation; no mutation occurs; no audit record is written.
Rollback. Disable the write endpoints; proposals sit in the pending queue without ever being committed.
Milestone 10 — Evaluator scoring + L3 performance amplification
Purpose. Close the feedback loop. Per-artifact rolling scores drop as outcomes degrade. L3 performance signals carry amplified weight.
Scope.
- oracle/apollo/evaluator/signals.py — detectors for the four failure signals (L3 error, L3 schema mismatch, user feedback, evaluator confidence). Signal 2 gated on component_kind == "agent" of the observed service (the envelope's service field) — oracle is the actual emitter under oracle-sole-observer, but the component-kind contract keys on the L3 target oracle observed. Look up the kind from ServiceRegistry at signal-application time.
- oracle/apollo/evaluator/scoring.py — rolling EMA per artifact. Weight tiers APOLLO_EVALUATOR_WEIGHT_L3_ERROR (3.0), APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH (3.0), APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK (1.5), APOLLO_EVALUATOR_WEIGHT_CONFIDENCE (0.5).
- oracle/apollo/evaluator/cascade.py — when an L3-dominant score drop occurs, flag upstream IntentPattern / PromptShim / SpecFragment for review on the next synthesis trigger.
- Demotion cadence: normal N=5 cycles, L3-dominant N=2 (APOLLO_EVALUATOR_L3_FAST_DEMOTE_N). The Curator reads these thresholds when recommending demote actions (still admin-triggered at this milestone).
- Repeated L3 failures on the same artifact within a short window escalate to a DriftEvent rather than silent demotion.
- Score decompositions preserved on every audit record so admins can see why a score moved.
Acceptance.
- Integration: synthetic L3 errors on traces that used pshim_xyz drive the artifact's rolling score below threshold; after N=2 cycles the Curator recommends demotion (visible in admin chat); upstream artifacts are flagged for synthesis review.
- Unit: weight math matches design-spec tiers; signal 2 is dark for library-emitted events.
Rollback. Disable Evaluator runs on the ingest worker; scores stop updating; existing scores preserved.
Milestone 11 — Admin chat empowerment
Purpose. Full conversational admin surface: explain / discuss / act. Admin's chain of reasoning is preserved in the audit log alongside Apollo's.
Scope.
- Extend oracle/apollo/chat/tools.py with the full tool set (design §Admin Chat):
- list_memories, get_memory, forget_memory
- promote_artifact, demote_artifact, rollback_artifact, forget_artifact
- rollback_graph, trigger_synthesis
- explain_decision(trace_id | artifact_id | audit_id) — retrieves stored rationale and resolves evidence_ref
- list_decisions, discuss_decision — conversational review of Curator actions
- pause_curator, resume_curator
- Admin actions via chat are audited with actor: "admin:<username>" and a fresh rationale capturing the admin's reasoning (or the tool-specific default).
- Private admin-chat MCP endpoint (mounted by oracle.apollo.chat.server) exposes these tools to Apollo's LLM; not aggregated into oracle's user-facing /agentspace catalog.
- indefinite: true flag wiring for critical admin actions (forget, pause/resume, rollback) — writes audit records with null expires_ts.
Acceptance.
- Admin chat test: admin asks "why did you demote pshim_xyz?"; Apollo's LLM calls explain_decision, retrieves the audit record, and presents the rationale + evidence in prose.
- Admin chat test: admin says "roll it back"; Apollo's LLM calls rollback_artifact; audit record with actor: "admin:<username>" and indefinite: true is written; injection-path payload on the next request reflects the rollback.
- Non-admin users receive 403 on /api/v1/apollo/chat.
Rollback. Revert to the read-only tool set; admin can inspect but not act.
Milestone 12 — Autonomous Curator + drift prevention tuning
Purpose. Flip the Curator to autonomous for evolution-class proposals. Drift-class proposals still require admin review. Pause/resume broadcast works. Drift thresholds tuned based on Milestone 8–11 production data.
Scope.
- Remove the "every mutation requires admin trigger" gate from Milestone 9. Curator now commits Evolution proposals autonomously after graph-anchor check passes.
- pause_curator() / resume_curator() set a process-wide flag; while paused, Curator refuses all mutations (even admin-triggered), and the admin-SSE debug feed emits a broadcast event with the pause status.
- Drift thresholds (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance) become production-tuned. Defaults proposed in the design spec are reasonable starting points; admin can tune via APOLLO_DRIFT_* settings.
- Repeated-L3-failure escalation to DriftEvent is wired end-to-end (visible in the admin audit feed).
- LLM-driven compaction of expiring observations (default: triggered on admin-initiated synthesis; can be fully autonomous if admin enables APOLLO_COMPACTION_AUTO).
Acceptance.
- Integration: evolution-class proposals commit without admin intervention; audit records show actor: "curator_auto".
- Integration: drift-class proposals produce DriftEvent and sit in the pending queue; admin must approve via chat.
- Integration: pause_curator halts all mutations; observing queued proposals in /stats shows them untouched until resume_curator.
Rollback. Gate autonomous commit behind APOLLO_CURATOR_AUTONOMOUS=false; Apollo reverts to Milestone 9 behavior.
Milestone 13 — Maintenance + /stats polish + degraded emitters
Purpose. Ops readiness. Hourly maintenance, snapshot coarsening, and a comprehensive /stats surface that surfaces every counter + per-service health.
Scope.
- oracle/apollo/maintenance.py — periodic background job (APOLLO_MAINTENANCE_INTERVAL, default 1 h) that:
1. Runs axonis_core.elastic.Elastic.delete_by_query on every index where expires_ts < now().
2. Coarsens apollo_graph_snapshots: delete hourly snapshots older than 7 days if a daily snapshot exists for that day; delete daily snapshots older than 30 days if a weekly snapshot exists.
3. Emits job metrics (apollo_maintenance_last_run_ts, apollo_maintenance_docs_deleted_total).
- /stats surface expanded to include per-service degraded_emitters array. A service is "degraded" when its apollo_ingest_last_ingest_ts{service} is older than APOLLO_INGEST_STALE_WARN_SEC, or when queue/lag thresholds are breached. Under oracle-sole-observer (§Invariants 14), for Phase-1 services this means oracle stopped observing them — e.g., oracle hasn't dispatched an MCP call to cortex in five minutes — not that a POST to Apollo failed. For secondary-path emitters (admin replay / out-of-process services), it means their POSTs stopped arriving.
- Admin audit surface: GET /api/v1/apollo/audit query filters (time range, action, actor, artifact id, artifact type, trigger, score-decomposition terms).
- intent_schema_coverage stat (design §Layer 1 Intent Schema Obligation) — percentage of traces with intent_schema in the last rolling window. Computed from oracle's in-process emissions (the only source of intent_schema observations in Phase 1).
Acceptance.
- Maintenance job runs on schedule; expired observations are delete_by_query'd; snapshot coarsening works across the three tiers.
- /stats exposes every counter named in the design spec's metrics tables, including apollo_ingest_last_ingest_ts for each Phase-1 service.
- degraded_emitters correctly flags a Phase-1 service when oracle hasn't observed it within the stale window.
- intent_schema_coverage updates correctly with a rolling window.
Rollback. Disable the maintenance scheduler; manual delete_by_query still possible via admin endpoint.
Milestone 14 — Subscriber LLM consumption (cortex L3)
Purpose. Close the L3 consumption side of the Injection Channel. Through M0–M13, oracle attaches apollo_guidance to every outbound MCP dispatch bound for an agent-kind L3 service — but cortex's MCP tool signatures don't declare the field, so FastMCP silently strips it before the handler runs. The brain is thinking; the body isn't listening. M14 wires the consumption side into cortex per Q20's locked contract. After M14, integration tests prove guidance changes the system prompt of a live LLM call inside a cortex tool rather than being attached and discarded.
Beacon (L1) is out of scope for this milestone. Beacon has no HTTP connection to oracle today (MCP_SERVER_URL defaults to http://localhost:8000/mcp, which is cortex direct). Until a beacon↔oracle connection is designed and wired (separate spec decision), apollo_guidance has no path into beacon's process. M14 ships the L3 reference implementation; the L1 wiring follows the same SDK pattern once beacon talks to oracle.
Scope — Cortex (L3).
- Add
axonis-core>=0.1.0tocortex/pyproject.tomland install editable from../axonis-core. Cortex imports the SDK directly from the canonical location —from axonis.core.apollo.guidance_cache import ApolloGuidanceCache— matching oracle's import pattern. Single source of truth, no drift-management overhead. (M14 development briefly explored vendoring the module locally to keep cortex lightweight, but cortex was already pullingfrom axonis.core.llm import LLMSpecfor its narrative tool, so the lightweight-agent argument didn't survive contact with the codebase. Vendoring remains documented in Q15 as an option for future subscribers whose dep posture differs.) - Replace the design-note stub at
cortex/cortex/server/app.py:40–43with executable wiring. The constraint "cortex never addresses Apollo directly" remains true — M14 only adds consumption of guidance that oracle has already attached. - MCP handler integration: in
cortex/cortex/server/mcp_handler.py:_handle_tools_call, before callingcall_tool(tool_name, arguments): - Pop
apollo_guidance = arguments.pop("apollo_guidance", None)from the inbound arguments. This both prevents downstream tool signatures from receiving an unexpected kwarg and isolates the guidance for cache update. - Instantiate a request-scoped
ApolloGuidanceCache(imported fromaxonis.core.apollo.guidance_cache), callcache.update(apollo_guidance). - Expose the cache to tool implementations via a
ContextVaratcortex.session.apollo_cache(get_cache()/set_cache()/reset_cache()/populate_from_arguments()helpers) so tools that internally run an LLM can read accessors without the cache leaking across requests. The handler'sfinallyblock callsreset_cache(token)so failure paths still clean up. - Per-tool LLM call augmentation: for any cortex tool that internally issues an LLM call, fold
cache.get_system_prompt_additions(intent_context)into the tool's system prompt andcache.get_tool_description_overrides(...)into its tool catalog before invocation. Tools that do not call an LLM internally simply ignore the cache — the contextvar is read-only and harmless when unread. - Cache lifetime: request-scoped. Created at the top of
_handle_tools_call; the contextvar resets when the handler returns or the tool raises. No cross-request leakage.
Tests.
- Cortex unit: call
_handle_tools_callwithargumentscontainingapollo_guidance; assert the field is removed fromargumentsbefore the tool runs, and the contextvar holds a populated cache during the tool's execution and is reset afterward. - Cortex unit (failure posture): call
_handle_tools_callwithargumentslackingapollo_guidanceand withapollo_guidance: None; assert no error is raised, the contextvar holds an empty cache, and accessors return empty lists /None. - Cortex integration: for a cortex tool that runs an LLM internally (or a thin test tool that records the system prompt it was given), assert the prompt contains each PromptShim's
content.textfrom the attachedapollo_guidancewhen the cache is populated, and is unchanged when guidance is absent.
Acceptance.
- Cortex's MCP tool handler removes
apollo_guidancefromargumentsbefore tool dispatch, populates a request-scoped cache, and exposes it to tool bodies via a contextvar. - Tests prove that an L3 LLM call inside a cortex tool observably changes when guidance is present vs. absent.
- Oracle's attach behavior is unchanged; M3, M5, M11, M13 tests still pass.
- Q20's failure posture is exercised: missing/None/malformed
apollo_guidanceis a no-op; the LLM call proceeds without guidance.
Rollback. Revert the per-service patches in cortex. Oracle's attach side has tolerated non-consumers since M3 (Q13 — attach budget overshoot omits the field without failure), so reverting M14 leaves the system in its M13 state with no functional regression at oracle.
Out of scope (this milestone).
- Beacon (L1) integration. Deferred until a beacon↔oracle connection is designed. Beacon's MCP_SERVER_URL defaults to cortex direct, so attached apollo_guidance has no path into beacon's process today. Once that connection lands, beacon's L1 wiring follows the same SDK pattern (session-scoped cache + accessor reads before the upstream LLM call).
- Parallax integration. Same wiring pattern as cortex (MCP handler argument pop + request-scoped cache contextvar) when it onboards. Tracked as a follow-on phase per the revised Q7 lineup.
- Other L1/L3 services. Once cortex is the reference implementation for the Q20 contract, additional subscribers (titan, athena, testament, rest/fedai-rest, parallax, beacon) onboard by following the same pattern. No Apollo-side spec change required beyond the eventual beacon↔oracle connection design.
Milestone 15 — Subscriber LLM consumption (oracle L2)
Purpose. Close the L2 consumption side of the Injection Channel. Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Apollo's MiniMax synthesis LLM is independent of this; the L2 path makes Apollo's guidance available to oracle's own chat LLM, the same way M14 made it available to cortex's. After M15, integration tests prove guidance changes the system prompt of oracle's tool-executor LLM call rather than being computed and discarded.
Scope — Oracle (L2).
- Add
for_l2(...)tooracle/apollo/guidance/attacher.py— same shape asfor_l1andfor_l3_agent. Returns theAttachedGuidancepayload bounded byAPOLLO_GUIDANCE_ATTACH_TIMEOUT_MS.scope_label="l2". Records attribution undertrace_idso the Evaluator correlates oracle-LLM signals back to the artifacts that shaped them, exactly as L1/L3 do. - Process-local
ApolloGuidanceCachefor oracle's chat LLM. Oracle owns oneApolloGuidanceCacheinstance (imported fromaxonis.core.apollo.guidance_cache). The cache is populated each turn fromfor_l2(...); populate-then-read happens in-process — no JSON serialisation, no envelope traversal. AContextVaratoracle.server.llm.apollo_cache(get_cache()/set_cache()/reset_cache()/populate_for_turn()helpers) keeps per-request isolation so the cache cannot leak across concurrent/chatrequests. - Tool-executor integration: in
oracle/server/llm/tool_executor.py, before each LLM turn: - Call
for_l2(user, intent_class, caller_tags, trace_id)to compute the applicable guidance for the current turn (or read it from the request-scoped cache if already populated upstream by the route handler). - Fold
cache.get_system_prompt_additions(intent_context)into the system prompt andcache.get_tool_description_overrides(...)into the tool catalog rendering before invoking the configured provider. - After tool dispatch returns, consult
cache.get_tool_pairing_hints(current_tool)to surface follow-up suggestions to the LLM if the configured provider supports tool nudges. - Cache lifetime: request-scoped. Populated at the top of
/chatrequest handling; the contextvar resets when the request returns or the handler raises. The cache holds the same artifact set across every turn of a single tool-use loop — no re-fetch per turn — so artifact applicability decisions don't shift mid-loop. - Failure posture. Identical to L1/L3: cache miss /
for_l2timeout / accessor exception → tool-executor proceeds with no guidance applied. The/chatrequest still succeeds. Counters increment (apollo_guidance_attach_timeout_total{scope="l2"}), no exception bubbles up.
Tests.
- Oracle unit (attacher):
for_l2(user="...", intent_class="...", trace_id="...")returns the sameAttachedGuidanceshape asfor_l1/for_l3_agent; honorsAPOLLO_GUIDANCE_ATTACH_TIMEOUT_MS; records attribution under thel2scope. - Oracle unit (cache):
oracle.server.llm.apollo_cachecontextvar isolates concurrent/chatrequests; reset-on-return is enforced even when the request raises. - Oracle integration: with a populated guidance set including a
PromptShim, the system prompt sent to the configured tool-executor provider observably grows; withfor_l2returningNone, the prompt is unchanged. Same shape of assertion as M14's cortex integration test. - Oracle failure posture: with
APOLLO_GUIDANCE_ATTACH_ENABLED=falseor with the attacher raising,/chatstill serves the request, the LLM call still happens, noapollo_guidanceis folded in.
Acceptance.
- Oracle's tool-executor consumes guidance via
ApolloGuidanceCacheon every/chatturn. - Tests prove that an oracle-side LLM call observably changes when guidance is present vs. absent, mirroring M14's L3-side proof.
- Oracle's existing M3 attach behavior on
/chatresponses (L1 attach) is unchanged; M3, M5, M11, M13 tests still pass. - Apollo's MiniMax LLM at
/api/v1/apollo/chatis unchanged. The two LLMs remain independent.
Rollback. Revert the tool-executor patch and the for_l2 attacher addition. The /chat surface continues to work with no guidance applied — same posture as before M15.
Out of scope (this milestone).
- Per-provider prompt formatting. The 5 providers in oracle's tool-executor have slightly different system-prompt conventions; M15 folds guidance via the existing system-prompt assembly path rather than introducing per-provider rendering. Provider-specific tuning is a follow-up if measurable wins surface.
- Sharing the L2 cache across /chat and /api/v1/apollo/chat. Apollo's admin chat runs Apollo's MiniMax LLM and consults its own caches/state; M15 leaves it untouched. The two surfaces remain independent.
Cross-cutting concerns (applied in every milestone)
Testing
Every milestone lands with the subset of design §Test Expectations that applies to the code it introduces. No milestone merges without green tests for its own scope.
At every milestone boundary, run the full test suite across every repo the milestone touches — not just the newly-added Apollo tests. In practice:
oracle/—.venv/bin/python -m pytest(picks uptests/+apollo/tests/via the testpaths config).axonis-core/— its own pytest run whenever the milestone added or changed anything in axonis-core (e.g., a newSchema.INDICESentry).
Report the pass count per repo in the milestone summary. Apollo milestones frequently touch axonis-core with additive changes; running the full suite catches regressions at the boundary rather than letting them propagate.
Observability
Every counter named in the design spec is registered with a zero value at startup (even before the code that drives it exists). Dashboards can be built as soon as Milestone 1 lands; they simply display zero for unused counters.
Logging
No Apollo module uses a module-local logging.getLogger() call. Every source file that emits log lines imports the three canonical loggers:
from axonis.core.logger import log, error, audit
This is the axonis-core implementation of the athena logging convention (athena/athena/logger.py) — same three-logger pattern, same format, same rotating-file handlers. Using the shared module guarantees Apollo's log lines interleave cleanly with oracle's and every other axonis service's when aggregated.
Per-file rule:
- log — routine telemetry (info, warning, debug).
- error — exceptions, permanent failures, data-loss events, misconfiguration. Must be used for every code path that surfaces a durable failure regardless of whether the exception is re-raised.
- audit — important transactions that must be independently traceable. See component.oracle.apollo-APOLLO §Logging → What counts as audit-worthy for the enumerated list. New milestones add to that list when they introduce new state-changing operations (e.g., M9's Curator actions, M11's admin-chat mutations, M12's autonomous commits).
Test discipline: milestone tests that assert on logging output import from axonis.core.logger (or monkeypatch it); they never instantiate a bare logging.Logger. A lint-level AST check catches stray logging.getLogger() calls in Apollo source.
Invariants enforcement
Every milestone that touches Curator-adjacent code enforces the hard invariants from §Curator → Disallowed actions in unit tests. Example: attempting a Curator action that would read another user's conversation data must raise CuratorPolicyViolation regardless of which code path invokes it.
Index mapping versioning
Every Elastic index mapping includes schema_version: 1 on every document. If a future milestone changes a mapping, it bumps schema_version and ships a migration plan in the PR — no implicit mapping changes.
Settings discipline
No milestone reads os.environ directly. Every configuration value flows through oracle/apollo/settings.py, which is the sole in-code reader of env vars. This keeps the surface area for config changes bounded and testable.
Deployment environment inheritance
Apollo's platform-level dependencies — Elastic, Redis, SSO, log-level/workspace, federation — come from the deployment env files in developers-environment/conf/ (one .env per target: development.axonis.ai.env, matrix.axonis.ai.env, edge.axonis.ai.env, vector.axonis.ai.env, etc.). No milestone redefines, shadows, or duplicates those variables with an Apollo-specific equivalent. The full inheritance contract lives in design-spec §Environment Configuration.
Every APOLLO_* variable also lives in the shared env file. The canonical home for Apollo configuration is developers-environment/conf/development.axonis.ai.env (plus target-specific overrides in the same directory). oracle/apollo/settings.py reads them via os.getenv(...) with defaults that match the env-file values — so the codebase still comes up if the env isn't sourced, but the authoritative source for every operator-facing knob is the deployment env file, not the Python module.
When a milestone adds a new APOLLO_* variable:
- Register it in
oracle/apollo/settings.pywith its default. - Add it to
developers-environment/conf/development.axonis.ai.envin the appropriate subsystem block, with the same default. - Document it in design-spec §Apollo-owned variables (the grouped list).
All three land in one commit. A milestone that only touches settings.py is incomplete.
Out of scope for this plan
The following work is intentionally deferred; each would be a separate plan once the corresponding prerequisites land:
- Consolidation of oracle's existing memory modules (§Deferred in the design spec). Apollo is additive throughout all 14 milestones;
oracle/server/memory/*andoracle/server/models/memory.pyare untouched. - Keycloak client-credentials grant for service-to-service auth. Blocks background/batch ingest workers; not required for user-request-context ingest, which is all Milestones 1–14 need.
- Additional L3 emitter onboarding beyond oracle + cortex. Parallax, UDS, athena, testament, titan, rest/fedai-rest onboard in follow-up work. Per design spec §Integration Backlog, each service is made visible to Apollo via one of two paths: (a) in-process relay (default) — when oracle MCP-dispatches to the service, oracle observes the round-trip and emits on its behalf with no code change in the service beyond its
component_kinddeclaration; or (b) direct POST viaApolloClient— when the service's outputs are not observable through an oracle-mediated MCP round-trip. No Apollo code change required either way. Subscriber-side wiring for any of these services follows the M14 cortex pattern (request-scopedApolloGuidanceCachepopulated fromarguments.apollo_guidance). - Required-mode flips (
APOLLO_REQUIRE_INTENT_SCHEMA,APOLLO_REQUIRE_TRACEPARENT). Stayfalsethrough these milestones; flipping totrueis a post-Milestone-14 ops decision once coverage is proven. - Production-grade
minimax-localLLM provider. M8 ships theminimax-localprovider as a scaffold — it honors the canonical HuggingFace load signature (AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)+AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)) and resolves weights from the HF cache at${HF_HOME:-~/.cache/huggingface}/hub/models--MiniMaxAI--MiniMax-M2.7/. The following production-hardening work is deferred (see design spec §Apollo's LLM → Local MiniMax via HuggingFace for the contract): APOLLO_LLM_LOCAL_MODEL_PATHenv override for operator-provided absolute paths (e.g., a mounted shared filesystem holding a custom MiniMax fine-tune).- Thread-pool / process-pool offload of the synchronous HF forward pass so the event loop isn't blocked.
- Device mapping + quantization knobs (
device_map="auto",torch_dtype, bitsandbytes 4-/8-bit settings) as env-configurable passthroughs. - Pre-pull orchestration + readiness gate: block
APOLLO_LLM_PROVIDER=minimax-localinstances from serving until the checkpoint is resident on disk and a warm-up forward pass has succeeded. - Streaming token output through the provider abstraction (admin chat UX).
Until these land, APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at a hosted MiniMax endpoint is the default production pattern; minimax-local is a dev / air-gapped-lab fallback.
Verification at completion
When all 15 milestones are green:
uv run pytestin oracle passes, including every test from design §Test Expectations.- A full request lifecycle from L1
/chat→ oracle → cortex → oracle → L1 produces: - Observations at every boundary under a single
trace_id— all emitted by oracle in-process (cortex's source tree contains no Apollo emission code). apollo_guidanceattached to the/chatresponse and to the MCP dispatch, plus consulted in-process by oracle's own chat LLM (L2 path).- Cortex consumes the attached guidance via
ApolloGuidanceCache(M14): integration tests show the system prompt sent to a downstream LLM call inside a cortex tool grows when guidance carries aPromptShim. Oracle's chat LLM consumes guidance via the L2 in-process path (M15): the system prompt assembled byoracle/server/llm/tool_executor.pyobservably grows by the same amount when guidance is present. The L1 attach side is live but the L1 subscriber wiring (beacon) is deferred — see M14 §Out of scope. - If synthetic failure signals are injected, the Evaluator demotes relevant artifacts within N cycles; the next attached MCP dispatch and the next oracle L2 cache populate both reflect the demotion, and the next cortex and oracle LLM calls observe the change.
- Admin can: inspect observations, view lineage, trigger synthesis, explain Curator decisions conversationally, roll back artifacts, pause/resume the Curator.
- The maintenance job runs hourly; expired observations are purged; snapshot coarsening tiers work.
/statsexposes every metric named in the design spec.- Apollo is unreachable: oracle still serves
/chatand dispatches tools;apollo_guidanceis omitted from envelopes; counters surface the degraded state.
Post-M15 — Prioritization Layers (shipped 2026-05-18)
The seven-layer prioritization rebuild is post-milestone work. It doesn't extend the M0–M15 build order — it overhauls how the attacher chooses which artifacts to send. Each layer is independently disabled by an env flag; none changes the underlying observation/synthesis/curator/evaluator model M0–M14 produced.
| Layer | Surface change |
|---|---|
| 1 | Capped artifacts get kind: "capped" lineage rows. New endpoints GET /lineage/capped and GET /artifacts/{id}/stats. |
| 2 | Five-tier sort key (evaluator_score → confidence → applicability specificity → weight → recency) + per-type attach caps. |
| 3 | Promote preserves evaluator_score, confidence, weight through _content_from_proposal. Contract pinned by test. |
| 4-A | Evaluator writes content.evaluator_score back to the artifact after every signal application. |
| 4-B | Synthesis prompts require confidence: 0.0..1.0; _normalize_confidence clamps + defaults. |
| 5 | rationale_summary names attached + capped artifact IDs per type. aggregate_artifact_stats query. |
| 6-A + 6-B | Promote computes an embedding and surfaces similar active artifacts as an advisory in the response. |
| 6-C | New coalescer background loop clusters near-duplicates and queues LLM-merged proposals with supersedes: [...]. Off by default. |
Full contract in design spec §Prioritization Layers. Backlog (the cap-defaults empirical study, §12.9 in docs/APOLLO-FUTURE-IMPROVEMENTS.md) waits on accumulated production telemetry rather than additional code.
Post-M15 — Longevity surfaces (shipped 2026-05-19)
Two operator-facing surfaces designed to answer "how is Apollo holding up over time", complementing the per-trace observability M9-M14 already provided.
| Surface | Change |
|---|---|
| Effectiveness rollup | New GET /effectiveness/summary?window=1d|7d|30d|90d (apollo/effectiveness.py). One read-only call aggregates observations (by event_type / service / caller_kind), synthesis (proposals + status mix + avg confidence + histogram), curator (audit-row counts by action), attach pressure (attached vs capped + per-service breakdown), artifact inventory (active + by-type + embedding coverage), and evaluator queue depth. Each section is independently failure-tolerant — a broken store returns a zero shape rather than poisoning the rest of the response. |
| Receiver-side persistence | ApolloGuidanceCache(persist_path=...) (axonis-core/axonis/apollo/guidance_cache.py). Successful update() atomically writes the snapshot to disk; the constructor reads it back so L1/L3 receivers serve last-known-good guidance across restarts. New last_updated_at(), is_stale(max_age_seconds), and snapshot(max_age_seconds=...) expose a serving_stale flag so an oracle outage that strands receivers on yesterday's cache is visible without grepping logs. Opt-in — omitting persist_path keeps the historical pure-in-memory behavior. |
Outstanding Spec Gaps
Authoritative home: SPEC-PLATFORM-14-APOLLO.md §Outstanding Items. This implementation plan no longer duplicates the open-issue register; the design spec is the single source of truth for what Apollo still owes against its mandates. Tier numbering, item status (OPEN / RESOLVED / WITHDRAWN), and resolution citations all live there.
This file retains:
- The 15-milestone build order above (the "how" of getting Apollo into production).
- The §Plan: axonis-core Bootstrap Idempotency Fix below (the concrete fix for Tier 2 items 5–6, which the design spec’s Outstanding Items cross-references back to).
When closing an Outstanding Item, update both: the design spec marks it RESOLVED with the commit ref; this plan adds a Post-M15 entry if the closure required new code paths worth a milestone-style writeup.
Plan: axonis-core Bootstrap Idempotency Fix
Concrete implementation plan for Tier 2 items 5–6 — the two axonis/elastic/manager.py idempotency bugs that block conduit + parallax boots against shared ES with partial state.
Repo: axonis-core (NOT oracle). Target branch: fusion-apollo (already merged from main as of fd11e19). Release target: v4.18.0 (semver patch — pure bugfix, no API change).
Step 1 — Tighten the existence check (root cause of both bugs)
The current _ensure_index uses any(k.startswith(index) for k in existing.keys()). This conflates the bare base name (data-fusion) with the date-stamped form (data-fusion-2026.06.04) and with sibling indices that happen to share a prefix (data-fusion-old). The right check is: does a date-stamped index for this base name exist? Anything else means we still need to create one.
File: axonis/elastic/manager.py
import re
from elasticsearch import BadRequestError, NotFoundError
_DATE_SUFFIX_RE = re.compile(r"^(?P<base>.+)-\d{4}\.\d{2}\.\d{2}$")
def _has_dated_index(base: str, existing: dict) -> bool:
"""True when `existing` carries at least one `<base>-YYYY.MM.DD` index.
Bare-name indices (e.g. a manually-created `data-fusion` from a legacy
workflow) do NOT count — bootstrap needs the date-stamped form so the
`<base>-*` alias has something to point at.
"""
for name in existing.keys():
m = _DATE_SUFFIX_RE.match(name)
if m and m.group("base") == base:
return True
return False
def _ensure_index(self, index: str, existing: dict) -> None:
if _has_dated_index(index, existing):
return
dated = "-".join([index, datetime.today().strftime("%Y.%m.%d")])
try:
self.es.indices.create(
index=dated,
body=read_template(file_name=f"{index}_mapping.json"),
)
log.info(f"CREATING INDEX: {dated}")
except BadRequestError as exc:
# Multi-worker race: another HPA replica won the create. Treat
# as idempotent success — the index now exists either way.
if "resource_already_exists_exception" in str(exc):
log.info(f"INDEX RACE OK: {dated} created by peer")
return
raise
This single change resolves Bug #5 (the race + the orphan-index re-create) AND eliminates the precondition for Bug #6 (parallax's bare data-fusion no longer short-circuits the create, so data-fusion-YYYY.MM.DD is created, and the subsequent put_alias(index="data-fusion-*", ...) has something to match).
Step 2 — Defensive alias self-heal
Belt-and-suspenders for the case where _ensure_index does create the dated index but ES's view of existing (snapshotted in bootstrap) is stale by the time we hit _ensure_alias. Catch NotFoundError and re-fetch:
def _ensure_alias(self, alias: str, index: str, existing: dict) -> None:
if any(alias in info.get("aliases", {}) for info in existing.values()):
return
try:
self.es.indices.put_alias(index=f"{index}-*", name=alias)
except NotFoundError:
# Bootstrap's `existing` snapshot was taken before _ensure_index
# created the dated index. Re-resolve the actual index and put
# the alias on it directly.
dated = "-".join([index, datetime.today().strftime("%Y.%m.%d")])
self.es.indices.put_alias(index=dated, name=alias)
log.info(f"CREATING ALIAS: {alias}")
Step 3 — Regression tests
Add tests/test_elastic_bootstrap_idempotency.py. Use unittest.mock.MagicMock for the ES client so the tests are pure-unit and don't require a live cluster. Two scenarios:
"""Regression: bootstrap is idempotent against partial ES state.
Both scenarios reproduce production failures from 2026-06-04 (oracle
SPEC-PLATFORM-14-IMPLEMENTATION Tier 2 §5–6). Each pre-state should now
result in a clean boot — no exception propagates out of bootstrap().
"""
from unittest.mock import MagicMock
import pytest
from elasticsearch import BadRequestError, NotFoundError
from axonis.elastic.manager import ElasticManager
def _make_manager(get_alias_return: dict, create_raises: Exception | None = None,
put_alias_raises: Exception | None = None) -> ElasticManager:
"""Construct an ElasticManager wired to a MagicMock es client. Skips
the __init__ network setup entirely."""
mgr = ElasticManager.__new__(ElasticManager)
es = MagicMock()
es.indices.get_alias.return_value = get_alias_return
es.indices.create.side_effect = create_raises
es.indices.put_alias.side_effect = put_alias_raises
mgr.es = es
return mgr
def test_ensure_index_swallows_resource_already_exists_race():
"""Bug #5: when another HPA worker wins the create() race, the loser
must not crash — both should converge on 'index exists'."""
mgr = _make_manager(
get_alias_return={}, # we see no index — try to create
create_raises=BadRequestError(
400, "resource_already_exists_exception", "...",
),
)
# Must not raise; idempotent.
mgr._ensure_index("data-ingest", existing={})
def test_ensure_index_creates_dated_when_only_bare_index_exists():
"""Bug #6 prerequisite: a legacy bare `data-fusion` index in ES (no
date suffix, no alias) must NOT trick _ensure_index into thinking
a usable index exists. The create must still run."""
mgr = _make_manager(get_alias_return={"data-fusion": {"aliases": {}}})
mgr._ensure_index("data-fusion", existing={"data-fusion": {"aliases": {}}})
mgr.es.indices.create.assert_called_once()
called_index = mgr.es.indices.create.call_args.kwargs["index"]
assert called_index.startswith("data-fusion-") # date-stamped
assert called_index != "data-fusion"
def test_ensure_alias_self_heals_on_empty_pattern_match():
"""Bug #6: when put_alias hits 404 because the bootstrap's `existing`
snapshot was taken before _ensure_index created the dated index,
self-heal by targeting the freshly-created dated index by name."""
pre_404_then_ok = [NotFoundError(404, "...", "..."), None]
mgr = _make_manager(
get_alias_return={},
put_alias_raises=lambda *args, **kwargs: pre_404_then_ok.pop(0) or None,
)
# Should not raise.
mgr._ensure_alias("data-fusion", "data-fusion", existing={})
# Final put_alias landed on the dated form, not the wildcard.
final_call = mgr.es.indices.put_alias.call_args
assert "data-fusion-" in final_call.kwargs.get("index", "")
Step 4 — Release + downstream fanout
- Branch off
fusion-apollo:git checkout -b fix/elastic-bootstrap-idempotency. - Apply Step 1 + Step 2 to
axonis/elastic/manager.py. - Add Step 3 test file. Verify
uv run pytest tests/test_elastic_bootstrap_idempotency.py -vpasses. - Open MR → axonis-core
main. semantic-release publishes v4.18.0. - Bump consumers (
uv add 'axonis-core>=4.18.0'in each): oracle/pyproject.tomlcortex/pyproject.tomlconduit/pyproject.tomlparallax/pyproject.tomlprism/pyproject.toml- In each consumer, run
uv lock --upgrade-package axonis-coreto refresh the lockfile to v4.18.0.
Step 5 — Verification against the production scenarios
After the bump lands in oracle:
- Re-run the workflow suite without the harness workaround (
_DEFAULT_SERVICESback to the full 5):uv run pytest -m workflow -v - Expected: conduit boots cleanly even with the stale
data-ingest-2026.06.04still in ES; parallax boots cleanly against the baredata-fusion(24 docs). test_workflow_oracle_to_parallax::test_guidance_injected_into_parallax_dispatchshould now run (was skipped pending parallax boot).- Strike Tier 2 items 5 + 6 from §Outstanding Spec Gaps.
Risk + rollback
- Risk: the tightened
_has_dated_indexcheck changes which indices count as "existing." A service that previously got away with a bare base-name index would now see a dated index created alongside it. The bare index becomes orphaned but not deleted. Mitigation: callout in the v4.18.0 release notes recommending operators clean up legacy bare indices once the alias is correctly attached to the new dated one. - Rollback: the change is contained to two private methods. Reverting the commit and re-tagging is straightforward; consumers that haven't yet bumped won't notice.
Function Flow Index
A developer reference cataloging every traversal in Apollo's runtime: which function calls which, with file:line citations and a brief reason. Use this as a debugging companion: pick a flow, follow the steps, identify where a request actually deviates from the documented path.
How to read. Each flow has a one-line trigger and a numbered list. Each step is shaped "caller (file:line) → callee — reason". Citations point at the actual call site. The deeper call-graph diagrams live in §Technical Overview (this spec); this section is the index.
Three groups:
- A. Request-time hot paths — synchronous; runs on the user's /chat thread.
- B. Background workers — async; off the request thread.
- C. Admin / lifecycle — operator-driven or process-lifetime.
Path conventions:
- oracle/... = the oracle repo root.
- cortex/... = the cortex repo root.
- apollo/... (alone) = oracle/oracle/....
A. Request-time hot paths
A1. /chat request — full lifecycle
Trigger: an L1 caller POSTs /chat to oracle.
POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway). Today's caller is curl, an integration test, or a direct API client; beacon onboards once a beacon↔oracle connection is wired. /api/v1/chat is distinct from POST /api/v1/apollo/chat (Apollo's admin chat — see flow C1), which runs Apollo's independent MiniMax LLM.
TraceparentMiddleware.__call__(oracle/server/middleware/trace.py:56) — reads the inboundtraceparentheader, calls_trace.parse_traceparent(:67); on missing/malformed calls_trace.mint_traceparent(:85); installs the result via_trace.set_current_traceparent(:87) so every emit downstream stamps the same id.OAuthMiddleware.__call__(axonis-core) — validates the Bearer token, writesrequest.state.token_payloadso dependencies that need auth can read it.routes.chat(oracle/server/api/routes.py:106) — handler entry; reads body + token.routes.chat:147callsapollo_chat.emit_user_promptto record the L1-origin observation before any work begins.routes.chatcallsToolExecutor().run(...)— drives the LLM tool-use loop (see flow A2).routes.chat:171callsapollo_chat.emit_final_responseso Apollo records what the user is about to see.routes.chat:185callsapollo_attacher.for_l1to compose theapollo_guidanceblock (see flow A5).routes.chatreturnsChatResponse(..., apollo_guidance=...)to L1.
A2. Tool-use loop inside ToolExecutor.run
Trigger: routes.chat invoked the LLM tool-use loop.
ToolExecutor.run(oracle/server/llm/tool_executor.py:89) calls the LLM provider router — gets the next LLM turn.ToolExecutor.run:156callsapollo_chat.emit_llm_turn— records the L2-origin turn.- If the LLM returned tool_calls,
ToolExecutor.run:205calls_call_backend_toolper call (see flow A3). - After each call returns,
ToolExecutor.runchecks for an error envelope: on error →:220callsapollo_chat.emit_tool_error; on success →:231callsapollo_chat.emit_tool_output. Oracle observes the L3 round-trip and emits on the L3 service's behalf. - Loop until the LLM stops emitting tool_calls; return text + tool_call trace to
routes.chat.
A3. Outbound MCP dispatch (_call_backend_tool)
Trigger: ToolExecutor.run decided to invoke an L3 tool.
_call_backend_tool(oracle/server/llm/tool_executor.py:301) callsregistry.get_tool_route(:308) to look up the L3 service base URL._call_backend_tool:333callsaxonis.trace.get_current_traceparentand at:335adds the value as an HTTP header so L3's logs correlate._call_backend_tool:343callsregistry.get_toolto readcomponent_kind. If:345component_kind == "agent",:347callsapollo_attacher.for_l3_agent(see A6) and injects the result intoarguments["apollo_guidance"]. Libraries skip steps 3–4._call_backend_tool:356openshttpx.AsyncClient(timeout=30.0)and POSTs the JSON-RPCtools/callbody to<base_url>/agentspace/mcp.- Returns the parsed response to
ToolExecutor.run.
A4. Cortex consumption — L3 side (M14)
Trigger: cortex's MCP server receives the JSON-RPC request from oracle.
Consumer-side wiring is now in axonis.apollo.ApolloMCPMiddleware (axonis-core) — installed once in cortex/server/__main__.py:153 and covers every @mcp.tool() cortex defines. The FastMCP handler at cortex/server/mcp/server.py is Apollo-unaware; the middleware handles popping + cache install + observation emit at the ASGI boundary.
- The middleware (
axonis-core/axonis/apollo/mcp_middleware.py:__call__) intercepts every POST to/agentspace/mcp. It buffers the JSON-RPC body and calls_parse_mcp_request(:248) which popsarguments.apollo_guidanceand returns the stripped arguments. - The middleware re-serializes the request with the stripped arguments (
_reserialize_with_stripped_arguments,:268) so the FastMCP handler never seesapollo_guidanceas a stray kwarg. - The middleware builds a fresh
ApolloGuidanceCache, callscache.update(extracted_guidance), and installs it on the request-scoped contextvar viaaxonis.apollo.request_scope.set_cache(:135). Empty / missing guidance still installs an empty cache so accessor calls return[]cleanly. - The middleware forwards the (now stripped + cached) request to FastMCP, which routes to the registered tool function (e.g.,
intelligence_create,draft_narrative_with_evidence). - The tool body calls
axonis.apollo.get_cache()(or, equivalently,cortex.tools.ai_support._get_apollo_cache()); the helper_format_apollo_guidance(cortex/ai_support_tools.py:81) readsget_system_prompt_additions,get_active_failure_patterns,get_spec_fragmentsand renders a## Apollo Guidancesection. Empty accessors omit their sub-section; an entirely empty cache produces a byte-identical pre-M14 prompt. - The tool calls
LLMClient.complete(messages=[{"content": prompt}])with the augmented prompt. - The middleware's
finallyblock callsreset_cache(token_cache)(mcp_middleware.py:151) — contextvar cleared whether the tool returned or raised. No cross-request leakage. - On the way out, the middleware captures the JSON-RPC response (
captured_sendat:143) and emits atool_outputortool_errorobservation back to oracle viaApolloClient.emit(:177).
A5. L1 guidance attach (attacher.for_l1)
Trigger: routes.chat is about to serialize the response.
apollo/guidance/attacher.py:161for_l1(...)calls_attach(layer="l1", ...)(:285)._attachcallsselectors.match_artifacts(layer="l1", ...)(apollo/guidance/selectors.py:40) and bounds the work byAPOLLO_GUIDANCE_ATTACH_TIMEOUT_MS.- On overshoot,
_attachincrementsmetrics.GUIDANCE_ATTACH_TIMEOUT_TOTAL.labels(scope="l1")and returnsNoneso the response serializes without the field. - On success, if a
trace_idwas supplied,_attachcalls two attribution writes (both wrapped in defensive try/except — neither can break the chat response): - In-memory:
oracle.evaluator.attribution.get().record(trace_id, scope, artifact_ids)— fast, TTL-bounded byAPOLLO_GRAPH_TRACE_STATE_TTL_SEC, used by the Evaluator's hot-path signal correlation. - Persistent (§7.3):
oracle.lineage.persist_attach(trace_id, scope, artifact_ids)— schedules a fire-and-forget asyncio task that writes one row per(trace_id, scope, artifact_id)toapollo_lineage_events. Retained forAPOLLO_LINEAGE_RETENTION_DAYS(default 90). Powers retroactive/lineagequeries over older traffic. - Returns
{as_of, artifacts, rationale_summary}toroutes.chat.
Steps 1–5 are the L1 attach side: oracle composes guidance for the response envelope. The L1 consumer side (beacon-style clients reading apollo_guidance from the response and calling ApolloGuidanceCache.update(...) locally) waits on the beacon↔oracle connection design.
In parallel, the same guidance set is consumed by oracle's own chat LLM via the L2 in-process path: attacher.for_l2(...) populates a process-local cache that oracle/server/llm/tool_executor.py reads on each tool-use turn. No transport — oracle hosts Apollo, so the cache is just a Python object passed by reference. See flow A7 below.
A6. L3 guidance attach (attacher.for_l3_agent)
Trigger: _call_backend_tool is about to POST to an agent-kind L3 service.
apollo/guidance/attacher.py:122 for_l3_agent(...) calls the same _attach(...) path (:173) but with layer="l3" and scope_label="l3:<service>". selectors.match_artifacts filters on applicability.layer == "l3" and the target service_name. Library-kind dispatches never reach this function — _call_backend_tool:345 filters them out.
A7. L2 guidance consumption — oracle's own chat LLM (M15)
Trigger: oracle/server/llm/tool_executor.py is about to assemble the prompt for the next tool-use turn during a /chat request.
- Tool-executor calls
attacher.for_l2(user, intent_class, caller_tags, trace_id)withlayer="l2"andscope_label="l2".selectors.match_artifactsfilters onapplicability.layer == "l2". - Returned
AttachedGuidancepayload populates a request-scopedApolloGuidanceCache(held in aContextVaratoracle.server.llm.apollo_cacheso concurrent/chatrequests don't share cache state). - Before the provider call, tool-executor reads
cache.get_system_prompt_additions(intent_context)and folds the strings into its system prompt; readscache.get_tool_description_overrides(...)and applies them to the tool catalog rendering. - After tool dispatch returns, tool-executor consults
cache.get_tool_pairing_hints(current_tool)for follow-up suggestions. - On
for_l2timeout / failure: cache stays empty for the turn, tool-executor proceeds with its baseline prompt — the/chatrequest still succeeds (failure posture mirrors L1/L3).
No transport — oracle hosts Apollo, so the cache is a Python object passed by reference within the same process. The L2 path is symmetric with L1 (response-attach) and L3 (MCP-arg-attach) in artifact applicability filtering and timeout budget; it differs only in transport.
B. Background workers
B1. Observation drain (per envelope)
Trigger: oracle.observer.ingest._queue.put_nowait was called by an emit helper or a secondary-path POST.
_drain_worker(apollo/observer/ingest.py:188) calls_queue.get(:201) to dequeue the next envelope._drain_worker:207calls_is_duplicate(:380); if dup, calls_queue.task_done(:209) and continues._drain_worker:213calls_write_with_retry(:342) — bounded retries on transient ES failures, eventually_default_writer(:402) →ApolloObservations.create._drain_worker:218callsextractors_module.apply(envelope, graph_set)(apollo/learner/extractors.py:46) — mutates the five Decision Graphs deterministically (see B2)._drain_worker:219callsgraph_set.drain_all_dirtyand:221invokes the_graph_writercallback (_default_graph_writerat:414) to persist any dirty nodes/edges toapollo_graph_nodes/apollo_graph_edges._drain_workercallsSynthesisEngine().schedule(envelope)(apollo/learner/synthesis.py:135) — fires the LLM if event type triggers (see B3)._drain_workercalls_evaluate_envelope(envelope)(apollo/observer/ingest.py:280) — runs the Evaluator pipeline (see B5). Synthesis runs before the evaluator on each envelope; either failing leaves observation persistence intact (both wrapped in try/except).- On worker exception path,
_drain_worker:269calls_dead_letterto write the envelope as JSONL ifAPOLLO_INGEST_DEAD_LETTER_PATHis set.
B2. Decision-graph update (deterministic, no LLM)
Trigger: extractors.apply runs inside the drain worker, post-write.
extractors.apply(apollo/learner/extractors.py:46) calls each per-graph extractor:_extract_intent_tool(:114),_extract_prompt_shape(:158),_extract_service_routing(:186),_extract_outcome(:219),_extract_iteration(:270).- Each extractor calls
graph_set.graph(graph_id)(apollo/learner/graphs.py:291) to get the rightDecisionGraph. - Each extractor calls
graph.upsert_node(kind, label, trace_id, at, ...)(apollo/learner/graphs.py:146) andgraph.upsert_edge(source, target, trace_id, at, success)— idempotent per(graph_id, kind, label)and per-trace. upsert_edgeupdatesweight_shortandweight_longEWMAs usingAPOLLO_GRAPH_EWMA_SHORT/APOLLO_GRAPH_EWMA_LONG. First observation pins both windows to the observation value (no cold-start artifact).- Some extractors call
graph_set.trace_scratch(trace_id)(:307) to stash intent/service info for later events on the same trace to stitch into.
B3. Synthesis dispatch (LLM-driven)
Trigger: _drain_worker calls SynthesisEngine.schedule(envelope) after the graph update.
SynthesisEngine.schedule(apollo/learner/synthesis.py:135) looks up_SYNTHESIS_FLAVOR[envelope.event_type]. IfNone(e.g.,llm_turn,final_response), returns immediately.- If
trace_idis inself._in_flight, records the latest envelope inself._latest_by_traceand returns — the running task picks it up on completion. - Otherwise calls
asyncio.create_task(self._run_trace(trace_id, flavor)). _run_trace(apollo/learner/synthesis.py:195) acquiresself._sem(concurrency cap =APOLLO_SYNTHESIS_MAX_CONCURRENT), then calls_synthesize_from_envelope(:218)._synthesize_from_envelopecalls_slice_graph_set(:344) to pull the relevant subgraph; calls the appropriateprompts.build_*_prompt(apollo/learner/prompts.py) to compose the LLM input._synthesize_from_envelope:236callsLLMClient.get().complete(system=..., messages=...)— this is Apollo's own LLM, separate from oracle's user-facing one._synthesize_from_envelope:294callsdrift_module.run_all(see B4) to gate the proposal._synthesize_from_envelope:315appends the proposal toself._pendingwithstatus: "approved"or"drift_flagged". The pending list caps at 500; older entries are dropped (:317-318).
B4. Drift gate (drift.run_all)
Trigger: _synthesize_from_envelope has an LLM proposal in hand.
drift.run_all(apollo/learner/drift.py:249) callscheck_proposed_pattern_vs_edges(:78) — proposal references must match real outcome-graph edges.- Calls
check_intent_classification_vs_clusters(:126) — proposed intent classes must match existing clusters. - Calls
check_weight_swings(:163) — z-score check against existing weight distribution; passes if <2 priors. - Calls
check_trajectory_coherence(:206) — proposed direction must align with EWMA trajectory; passes if no trajectory yet. - Returns
DriftCheckResult(approved=all_passed, checks=[per_check_detail]). No LLM involved here — pure math against graph state.
B5. Evaluator pipeline (_evaluate_envelope)
Trigger: _drain_worker calls _evaluate_envelope (apollo/observer/ingest.py:280) after synthesis schedule for any qualifying observation.
_evaluate_envelope:299callsattribution.get().applied_for(trace_id, service_name)(apollo/evaluator/attribution.py) to find which artifacts the attacher recorded for this trace. Returns early if no attributions._evaluate_envelope:306callssignals.detect_signals(envelope, applied_artifact_ids)(apollo/evaluator/signals.py) — returns a list ofSignalHitfor L3_ERROR / SCHEMA_MISMATCH / USER_FEEDBACK / EVALUATOR_CONFIDENCE. Returns early on no hits._evaluate_envelope:316gets the singletons viascoring.get_engine()andrecommendations.get_queue().- Per signal hit:
engine.apply_signal(artifact_id, signal_kind, magnitude)(apollo/evaluator/scoring.py) — pulls the EMA score toward its tier asymptote. - After each apply,
cascade.cascade_on_l3_dominant(engine, artifact_id)(apollo/evaluator/cascade.py) — returnsCascadeOutcomewith action in {none,drift_event,recommend_fast_demote,recommend_demote}. - If non-none,
_evaluate_envelopecallsqueue.add(Recommendation(...))(apollo/evaluator/recommendations.py) with score + decomposition +upstream_artifact_ids. Replace-semantics: latest rec for an artifact overrides the prior one.
B6. Curator atomic sequence (every mutation)
Trigger: any of actions.promote / demote / forget / edit / rollback, called from an admin endpoint, an admin-chat tool, or the autonomous sweep.
- The action calls
pause.raise_if_paused(apollo/curator/pause.py:73) first — if curator is paused, raisesCuratorPaused; the sequence stops. - The action calls
policy.allow_or_raise(ActionRequest(kind, actor, artifact_id))(apollo/curator/policy.py:110) — six hard invariants enforced; raisesCuratorPolicyViolationif any tripped. - The action calls
_copy_current_to_history(artifact_id)(apollo/curator/actions.py:105) — prior version moves toapollo_artifact_historywith aretired_atstamp. - The action calls
_artifacts().create(record, uid=artifact_id)(actions.py:190for promote;:255demote;:392edit) — overwrites current with the new version. - The action calls
audit.write_audit(ApolloAuditRecord(...))(apollo/curator/audit.py:95) — requiredrationale, optionalevidence_ref,indefinite=Truefor critical actions like forget/rollback/pause. - The action calls
_broadcast_commit(action_kind, artifact_id, actor)(actions.py:123) — firesSSEHub().broadcast({event: "curator_commit", ...}, scope="*"). SSE failures are swallowed; durable state already landed. - Returns
ActionResult(action, artifact_id, version, audit_record_id, before_version_id, after_version_id)to the caller.
The five mutation entry points: actions.py:148 promote, :222 demote, :297 forget, :349 edit, :424 rollback. All five share the same atomic shape above.
B7. Autonomous curator sweep (M12)
Trigger: oracle.app:65 schedules auto.run_periodic as a long-lived task at startup; fires on a periodic interval.
auto.run_periodic(apollo/curator/auto.py:248) callsauto.sweep_onceon each tick.sweep_once(apollo/curator/auto.py:112) readssettings.APOLLO_CURATOR_AUTONOMOUS. Disabled → returns{ran: False, reason: "disabled"}.sweep_oncecallspause.is_paused()(apollo/curator/pause.py:65). Paused → returns{ran: False, reason: "paused"}.- For each
status: "approved"proposal inSynthesisEngine().pending_snapshot(): sweep_oncecallsderive_artifact_id(proposal)(auto.py:54) — deterministic prefix-hash so repeated proposals converge on one artifact.sweep_oncecallsactions.promote(actor="curator_auto", trigger="autonomous_curator", ...)— runs flow B6.- For each
kind in ("demote", "fast_demote")recommendation inRecommendationQueue.snapshot(): sweep_oncecallsactions.demote(actor="curator_auto", trigger="autonomous_curator", evaluator_score=..., score_decomposition=..., upstream_artifact_ids=...)— runs flow B6.sweep_oncereturns{ran: True, auto_promoted: N, auto_demoted: M, drift_retained: K}. Drift-class work (status: "drift_flagged"andkind: "drift_event") is left for admin review.
B8a. Prioritization-layer cross-cuts (Layers 1, 4-A, 6-A/B)
Where each layer hooks in. The prioritization rebuild added several call sites scattered across existing flows. Listed here together so a reader can map "what fires when" without re-reading each parent flow.
- Layer 1 — capped-lineage persist. Inside
attacher._attach(apollo/guidance/attacher.py) after_apply_attach_capsreturns the(kept, dropped_pairs)tuple. Iftrace_idis present anddropped_pairsis non-empty, the attacher callslineage.persist_capped(trace_id, scope, capped=dropped_pairs)(apollo/lineage/persist.py). Fire-and-forget; writeskind: "capped"rows toapollo_lineage_eventswithartifact_type+ scope. Failures land inapollo_lineage_capped_persist_failed_total(no log noise). - Layer 4-A — evaluator score writeback. Inside
_evaluate_envelopeafterengine.apply_signal()in step 4 of flow B5. The ingest worker callspersist.persist_score_to_artifact(artifact_id, score, decomposition)(apollo/evaluator/persist.py) — fire-and-forget Painless script update writingcontent.evaluator_scoreandcontent.score_decomposition. The next attach call's_sort_keyreads it. Kill switch:APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED. - Layer 6-A — promote-time embedding. Inside
actions.promote(step in flow B6), right before_artifacts().create, the action calls_embed_and_find_similar(apollo/curator/actions.py) which: - Calls
similarity.compute_embedding(content, artifact_type)(apollo/learner/similarity.py) — reusesaxonis.memory.embedder.embed. ReturnsNoneif sentence-transformers is unavailable (graceful degradation). - If embedding succeeded, stores it under
content.embedding_vectorso the new record persists it on write. - Layer 6-B — similarity advisory. Same
_embed_and_find_similarhelper then calls_load_active_set_for_similarity(artifact_type)andsimilarity.find_similar_active_artifacts(...). Hits ≥APOLLO_SIMILARITY_THRESHOLD(default 0.9) are returned to the promote handler and surface inActionResult.similar_artifacts. The promote still succeeds — advisory only.
B8b. Coalescer sweep (Layer 6-C)
Trigger: oracle.app:_coalescer_task is created at startup when APOLLO_COALESCER_ENABLED=true (off by default); fires every APOLLO_COALESCER_INTERVAL_SEC (default 3600s).
coalescer.run_periodic(apollo/learner/coalescer.py) sleepsAPOLLO_COALESCER_INTERVAL_SECthen callsrun_sweep_onceon each wake.run_sweep_oncecalls_load_all_active_artifacts— single ES scan returning everystatus=activeartifact._find_clusterspartitions by(type, applicability.service_name, applicability.tool_name). Within each partition,_pairwise_clusterruns union-find over cosine similarity ≥APOLLO_COALESCER_THRESHOLD(default 0.85). Yields clusters of ≥ 2 members.- Bounded by
APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN(default 5) — extras counted assummary.skipped. - For each in-budget cluster:
_propose_merger_for_cluster(cluster, client)calls Apollo's LLM viabuild_coalesce_prompt(artifact_type, cluster)(apollo/learner/prompts.py). The LLM writes a single artifact whose content covers every member's intent; the helper injectssupersedes: [id1, id2, ...]listing the cluster members. _record_merger_proposalroutes throughSynthesisEngine._record_proposal(apollo/learner/synthesis.py) so the merger gets confidence normalization, drift checks, and lands onapollo_proposalslike any other proposal.- Bad JSON / LLM failures increment
apollo_coalescer_merge_failed_totaland skip the cluster — sweep continues. - Returns
{clusters_found, proposals_emitted, skipped}for/statsand tests.
Admin downstream. When the admin promotes a coalescer-emitted proposal via POST /api/v1/apollo/artifacts/{merger_id}/promote with supersede: true, the promote handler (flow B6, extended) reads proposal.supersedes and demotes each listed artifact in the same atomic batch (each via flow B6's demote path, each with its own audit record).
B8. Maintenance tick (M13)
Trigger: oracle.app:73 schedules maintenance.run_periodic as a long-lived task at startup; fires every APOLLO_MAINTENANCE_INTERVAL (default 1h).
maintenance.run_periodic(apollo/maintenance.py:35) callsrun_onceon each tick.run_once:61(apollo/maintenance.py:51) calls_purge_expired(:82) — for each ofapollo_observations,apollo_audit,apollo_graph_snapshots: builds{"range": {"expires_ts": {"lt": now}}}and calls_delete_by_query(:114). Indefinite records (expires_ts: null) are skipped automatically by the range filter.run_once:65calls_coarsen_snapshots(:146) — hourly→daily→weekly tiering hook. Currently a no-op stub; tier-generation logic deferred per Q5.run_once:69calls_emit_metrics(:170) — setsapollo_maintenance_last_run_tsand incrementsapollo_maintenance_docs_deleted_total{index}per index.
/stats read-side helpers are siblings: degraded_emitters (:186) scans INGEST_LAST_INGEST_TS; intent_schema_coverage (:214) computes the rolling fraction.
C. Admin / lifecycle paths
C1. Admin chat loop (M11)
Trigger: admin POSTs /api/v1/apollo/chat with {"action": "chat", "message": "..."}.
chat.server.admin_chat(apollo/chat/server.py:79) validates the admin role + parses the request.- For
action: "chat",admin_chat:117calls_run_chat_loop(message, actor)(:196). _run_chat_loop:203calls_render_tool_catalog_for_llm(:303) to build the system context._run_chat_loop:209getsLLMClient.get()and starts the bounded iteration loop (_MAX_CHAT_ITERATIONS = 6).- Per iteration,
:220callsclient.complete(messages, response_format="json"). - If LLM returned
{"action": "call_tool", ...}::253calls_run_tool_or_400(tool_name, arguments, actor)(:134) which looks upchat_tools.TOOL_IMPLEMENTATIONS[tool_name](apollo/chat/tools.py:538) and invokes it. Mutation tools route throughactions.*(flow B6) withtrigger="admin_chat". _run_chat_loopappends the tool result tomessagesand continues the loop.- If LLM returned
{"action": "respond", "text": ...}: returns the prose + tool_trail toadmin_chat, which returns the JSON response to the operator.
For action: "invoke", admin_chat:109 skips the loop and calls _run_tool_or_400 directly — single-shot tool invocation without the LLM.
C2. Pause / resume curator (M11)
Trigger: admin calls pause_curator(...) (chat tool or REST endpoint).
pause.set_paused(apollo/curator/pause.py:80) sets the module-level_state(in-memory only — by design, not persisted across restart).set_paused:107callsaudit.write_audit(ApolloAuditRecord(action=PAUSE_CURATOR, indefinite=True, ...))— pause records never expire.set_paused:110callsSSEHub().broadcast({event: "curator_paused", ...})so admin clients see the freeze in real time.- From this moment, every Curator function calls
raise_if_paused(apollo/curator/pause.py:73) at the top (step 1 of B6), causingCuratorPauseduntilclear_paused(:125) runs the inverse (audit at:151+ broadcast at:154).
pause.is_paused (:65) is the pure read — used by sweep_once (B7) and by chat tools that want to surface the freeze without raising.
C3. Startup
Trigger: oracle process boots; oracle.app.startup (apollo/app.py:46) runs from oracle's Starlette lifespan.
oracle.app.startup:51callsingest_module.startup(apollo/observer/ingest.py:70) — initializes_queue, sets_writer/_graph_writerto defaults, spawns_drain_worker × APOLLO_INGEST_WORKER_CONCURRENCY(:84).oracle.app.startup:55callsSynthesisEngine().set_graph_getter(...)so the synthesis dispatcher can pull subgraph excerpts from the live in-memorygraph_set.oracle.app.startup:58schedulessnapshots_module.run_periodic(hourly graph snapshot loop) as_snapshot_task.oracle.app.startup:65schedulesauto.run_periodic(B7) as_auto_task.oracle.app.startup:73schedulesmaintenance.run_periodic(B8) as_maint_task.
What's NOT wired today: graph_set.load_from_records(...) (apollo/learner/graphs.py:340) is referenced in design docs and exercised by tests, but oracle's startup path does not call it. The graph_set comes up empty on each restart and rebuilds as observations stream in. The pause state is also intentionally non-durable — a fresh process always comes up unpaused (per apollo/curator/pause.py docstring).
C4. Secondary-path ingest (admin replay / out-of-process emitter)
Trigger: an admin or out-of-process service POSTs to /api/v1/apollo/observations.
TraceparentMiddlewareandOAuthMiddlewarerun as on the/chatpath (A1 steps 1–2).guidance.api.post_observations(apollo/guidance/api.py:66) parses the body into anObservationBatch. Stampscaller_identityfrom the token if the envelope didn't carry one.post_observationscallsingest.ingest(envelope)(apollo/observer/ingest.py:143) per item — exact same downstream path as the primary in-process call (the rest of B1 from step 1 onward).- Returns
202 Acceptedwith{accepted, dropped}as soon as every envelope is enqueued or counted as full.
Notes for readers
- Flow A1 is the umbrella — A2–A7 are sub-flows it triggers, in roughly that order during a single
/chat. - Flow B6 is the universal mutation shape — every place artifacts change (admin endpoint, admin chat, autonomous sweep) routes through it.
- Apollo never originates a network call. Apollo lives inside oracle's process; every
httpx.postyou see in flows above is oracle calling out, not Apollo. Apollo's only HTTP surface is the inbound/api/v1/apollo/*routes oracle mounts. - Observations and guidance go in opposite directions — oracle → Apollo for observations (in-process); Apollo → subscriber for guidance (response-attached). Both ride existing envelopes; neither uses a separate transport.
- Synthesis runs before the evaluator on each envelope (
_drain_workerorder: write → graph update → synthesis schedule → evaluator). Both are wrapped in defensive try/except so neither can wedge ingest. - B6 is the atomic boundary, not a transaction. Steps 3–6 are tightly coupled but ES is not transactional across indices — partial failure at step 4 leaves a history record with no live successor. Tracked in §Future Improvements §7.1.
For deeper detail (ambient state tables, telemetry counter inventory, full call graphs), see oracle/specs/APOLLO-TECHNICAL-OVERVIEW.md. For the why behind each design choice, see oracle/specs/SPEC-PLATFORM-14-APOLLO.md §Design Decisions.
Technical Overview
A working technical reference describing the live Apollo runtime: who calls what, when, why, and how. Focused on the call graph, not the code layout. This complements §Function Flow Index (which is the file:line citation index) — this section is the operational call graph: what fires, under what condition, in what process, on what thread, backed by what state.
Process topology — where each component lives
Apollo is a package inside oracle's process. It is not a standalone service; it has no network of its own; it shares oracle's Python interpreter, asyncio event loop, Starlette app, and auth middleware.
| Process | What runs there | Notes |
|---|---|---|
| oracle | Oracle's REST/MCP handlers + Apollo (as a package) | The only externally-reachable service (SPEC-03 §1) |
| cortex | Cortex's Starlette app + MCP handler + request-scoped ApolloGuidanceCache (M14, imported from axonis.core.oracle.guidance_cache) |
No Apollo emission code; no ApolloClient; no ApolloIntegration |
| beacon | Beacon's chat ingress + per-provider LLM call | Deferred — beacon has no HTTP connection to oracle today (MCP_SERVER_URL points at cortex direct). L1 subscriber wiring follows the cortex SDK pattern once that connection is designed. |
| parallax | Deferred from Phase 1 | When onboarded, follows the cortex MCP-handler pattern (argument pop + request-scoped cache) |
| other browser / L1 clients | L1 code — composes prompts, renders responses | Optional local ApolloGuidanceCache per session, same pattern as beacon will follow once L1 wiring lands |
| admin CLI / admin browser | Admin tooling — calls /api/v1/apollo/* endpoints |
Only admin-role tokens get past the guard |
Consequence. There is no "Apollo server" to deploy independently. There is no IPC between oracle and Apollo. Every call into Apollo from oracle is a direct Python function call on the same event loop.
The one rule that shapes every call graph
Neither L1 nor L3 calls Apollo directly. Both call oracle. Oracle calls Apollo.
Captured as Invariant #14 in the design spec. Every call graph in this section respects it. When you see something that looks like it violates it, re-read — the caller is always oracle or an admin.
Ambient state — what's alive for the duration of a request
Apollo uses three kinds of ambient state. Each has a clearly-scoped lifetime.
| State | Type | Scope | Who sets | Who reads |
|---|---|---|---|---|
axonis.core.trace._current_traceparent |
ContextVar[str] |
Per inbound request (per async task) | TraceparentMiddleware |
Every emit helper, extract_http_headers, every outbound httpx call that forwards traceparent |
request.state.token_payload |
Attribute on the Starlette request object | Per request | OAuthMiddleware (axonis-core) |
Every FastAPI dependency that does auth (require_auth, require_admin) |
SynthesisEngine._in_flight / _latest_by_trace / _pending |
Process-wide dicts on the singleton | Process lifetime | SynthesisEngine.schedule() |
Drain worker + pending_snapshot() for admin |
apollo_artifacts / apollo_artifact_history / apollo_audit |
Elasticsearch indices (Milestone 9) | Persistent | curator.actions.* (promote / demote / forget / edit / rollback) |
GET /artifacts + GET /audit + subscriber attach path |
AttributionRegistry._by_trace |
Process-wide dict on the M10 singleton | Per-request (aged out at TTL) | attacher.for_l1 / for_l3_agent when called with trace_id= |
ingest._evaluate_envelope on every qualifying observation |
ScoringEngine._scores |
Process-wide dict on the M10 singleton | Process lifetime | ingest._evaluate_envelope → engine.apply_signal() |
cascade.cascade_on_l3_dominant(), /recommendations, /stats |
RecommendationQueue._by_artifact |
Process-wide dict on the M10 singleton | Process lifetime | ingest._evaluate_envelope → queue.add() |
GET /api/v1/apollo/recommendations; cleared by curator.demote() |
curator.pause._state (PauseState) |
Process-wide dataclass (M11) | Process lifetime (reset on restart) | curator.pause.set_paused() / clear_paused() |
raise_if_paused() at top of every Curator mutation; oracle.chat.tools.pause_curator / resume_curator |
| Autonomous sweep loop (M12) | asyncio.Task in oracle.app._auto_task |
Process lifetime | oracle.curator.auto.run_periodic started from oracle.app.startup |
N/A — writes to audit + SSE on each commit |
| Maintenance loop (M13) | asyncio.Task in oracle.app._maint_task |
Process lifetime | oracle.maintenance.run_periodic started from oracle.app.startup |
Purges expired docs via delete_by_query; updates apollo_maintenance_last_run_ts + apollo_maintenance_docs_deleted_total |
| Synthesis sweep loop | asyncio.Task in oracle.app._sweep_task |
Process lifetime | oracle.learner.synthesis.run_sweep_periodic started from oracle.app.startup |
Periodic event-independent synthesis pass; updates apollo_synthesis_sweep_* metrics |
| Coalescer loop (Layer 6-C) | asyncio.Task in oracle.app._coalescer_task |
Process lifetime | oracle.learner.coalescer.run_periodic started from oracle.app.startup (off by default; APOLLO_COALESCER_ENABLED=true to activate) |
Scans active artifacts for similarity clusters; queues LLM-merged proposals on apollo_proposals with supersedes: [...] |
| Score writeback (Layer 4-A) | Fire-and-forget tasks scheduled per signal | Per-signal | oracle.evaluator.persist.persist_score_to_artifact called from ingest._evaluate_envelope |
Updates content.evaluator_score + content.score_decomposition on the artifact doc; sort key reads it on next attach |
SSEHub._subs |
Process-wide dict on the singleton | Process lifetime | GET /guidance/stream handler |
broadcast() from Curator (M9+), GET /subscribers |
ingest._queue, ingest._dedup_window |
Module-level asyncio objects | Process lifetime | ingest.startup() |
ingest.ingest(), drain workers |
graph_set |
Module-level GraphSet instance |
Process lifetime | ingest.startup() |
Drain workers (extractors), SynthesisEngine, hourly snapshot task |
The /chat call graph — the canonical flow
Every box below is a function/method invocation. The arrows are synchronous calls unless marked [async]. "Oracle" and "Apollo" are both inside the same process; every → between them is an in-process function call.
Inbound: L1 → Oracle
POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway). Today's L1 caller is curl or another direct API client; beacon onboards once a beacon↔oracle connection is wired. This is a separate surface from POST /api/v1/apollo/chat (Apollo's admin chat at oracle/oracle/chat/server.py:79), which runs Apollo's independent MiniMax LLM for talking to Apollo's synthesis brain.
L1 caller (curl / future beacon)
│ POST /chat (HTTP; body: {message, conversation_id, model}; headers: Authorization + traceparent)
▼
oracle uvicorn worker
│
├─▶ TraceparentMiddleware.__call__() [ASGI middleware, outer]
│ └─▶ axonis.core.trace.parse_traceparent(header)
│ └─▶ axonis.core.trace.set_current_traceparent(value) (ContextVar installed)
│
├─▶ OAuthMiddleware.__call__() [axonis-core, next inner]
│ └─▶ validates Bearer; writes request.state.token_payload
│
└─▶ FastAPI router → routes.chat(body, request, token_payload)
Why: the middlewares run outside the handler so trace_id and token_payload are already in scope when business logic runs. Both are per-request ambient state — no code needs to thread them explicitly across ~10 internal function boundaries.
Handler: oracle's /chat → Apollo (L1 emit)
routes.chat(body, request, token_payload)
│
├─▶ RateLimiter().check(client_id) [local, guardrail]
├─▶ GuardrailPolicy.load() → filter allowed tools
├─▶ ConversationStore().get(conversation_id) [Redis or in-mem fallback]
│
├─▶ axonis.core.trace.current_trace_id() [reads ContextVar]
│ └─▶ parsed value from TraceparentMiddleware
│
├─▶ oracle.hooks.chat.emit_user_prompt(
│ prompt=body.message,
│ conversation_id=body.conversation_id,
│ token_payload=token_payload,
│ trace_id=<from ContextVar>) [async, in-process]
│ │
│ └─▶ oracle.observer.ingest.ingest(envelope)
│ └─▶ _queue.put_nowait(envelope) (non-blocking)
│ ↳ if QueueFull: metrics.INGEST_QUEUE_DROPPED_TOTAL.inc()
│
└─▶ ToolExecutor().run(...) [see tool-use loop]
Why: L1 never reaches Apollo. Oracle extracts L1 signals from the /chat body and calls the observer in-process. emit_user_prompt is the canonical entry point — it builds the envelope, stamps caller_identity from token_payload, and drops it on the async queue. The helper never raises into the handler; if the queue is full, it counts + returns.
Tool-use loop: Oracle ↔ LLM + Oracle → L3 → Apollo
ToolExecutor.run(...)
│
├─▶ for iteration in range(max_iterations):
│
│ ├─▶ llm.router.complete(messages, model, tools, system)
│ │ └─▶ provider-specific SDK call (anthropic / openai / …)
│ │
│ ├─▶ oracle.hooks.chat.emit_llm_turn(...) [async, in-process]
│ │ └─▶ ingest.ingest(envelope type=llm_turn)
│ │
│ ├─▶ if tool_calls:
│ │ for tc in tool_calls:
│ │ │
│ │ ├─▶ registry.get_tool(tool_name) (ServiceRegistry lookup)
│ │ ├─▶ t0 = time.perf_counter()
│ │ ├─▶ _call_backend_tool(...) [see outbound MCP dispatch]
│ │ ├─▶ latency_ms = (now - t0) * 1000
│ │ │
│ │ ├─▶ if error in result_text:
│ │ │ oracle.hooks.chat.emit_tool_error(...) [async]
│ │ │ else:
│ │ │ oracle.hooks.chat.emit_tool_output(...) [async]
│ │ │
│ │ │ (both go through ingest.ingest() → in-process queue)
│ │ └─▶
│ └─▶ else: break
│
└─▶ Meter.record_tokens(client_id, provider, …)
Why: Oracle's LLM loop is the L2 emitter for llm_turn events and the L3 observer for tool_output/tool_error. The emitter wraps the backend call with a timer so latency rides on every envelope. Errors are detected by parsing _call_backend_tool's JSON return (it serializes errors into {"error": ...} rather than raising) so the emit path can cleanly branch between output and error.
Oracle → L3 (outbound MCP dispatch)
_call_backend_tool(registry, tool_name, tool_args, raw_token)
│
├─▶ base_url = registry.get_tool_route(tool_name)
│ (if None → local oracle tool; short-circuit path omitted here)
│
├─▶ headers = {Content-Type, Accept}
├─▶ headers["Authorization"] = f"Bearer {raw_token}"
│
├─▶ axonis.core.trace.get_current_traceparent() [reads ContextVar]
│ └─▶ headers["traceparent"] = <value> (if present)
│
├─▶ tool_info = registry.get_tool(tool_name)
│ if tool_info.component_kind == "agent":
│ oracle.guidance.attacher.for_l3_agent(
│ service_name=tool_info.service_name,
│ tool_name=tool_name,
│ intent_class=None) [in-process; bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS]
│ └─▶ if not None: tool_args["apollo_guidance"] = <payload>
│ else:
│ (library → skip guidance injection entirely)
│
└─▶ httpx.AsyncClient(timeout=30.0).post(
f"{base_url}/agentspace/mcp",
json={jsonrpc, method="tools/call", params={name, arguments}},
headers=headers)
Why: oracle is the only place that knows the component_kind of each L3 target. It filters libraries out of guidance attachment, preserves Authorization end-to-end for L3 to authenticate the user, and forwards traceparent so L3's own logs correlate with oracle's observations. The attacher runs in-process (Apollo lives here) so there is no network hop to fetch guidance.
L3 side: what cortex does with apollo_guidance
The flow below is what M14 wires (current Phase 1 subscriber for L3). Through M5–M13 the attached field rode the wire but FastMCP stripped it before the tool ran — guidance was effectively dark on the consumption side. M14 adds the pop + cache update + accessor reads.
cortex MCP handler receives tool call (cortex/cortex/server/mcp_handler.py:_handle_tools_call)
│
├─▶ apollo_guidance = arguments.pop("apollo_guidance", None)
├─▶ cache = ApolloGuidanceCache() [from axonis.core.oracle.guidance_cache]
├─▶ cache.update(apollo_guidance) [no-op when None]
├─▶ apollo_cache_var.set(cache) [request-scoped ContextVar]
│
├─▶ call_tool(tool_name, arguments) # arguments no longer carries apollo_guidance
│ ├─▶ during any LLM call inside the tool, the implementation reads:
│ │ cache.get_system_prompt_additions(intent_context) → appended to system prompt
│ │ cache.get_tool_description_overrides(tool_name) → applied per tool
│ │ cache.get_spec_fragments(...) → optional
│ │ cache.get_tool_pairing_hints(...) → optional
│ │ cache.get_active_failure_patterns(...) → optional
│ │ cache.get_service_connection_hints(...) → optional
│ │
│ └─▶ folds the results into its prompts / routing decisions
│
└─▶ returns the MCP response back to oracle (the contextvar resets when the handler returns)
Why: cortex never initiates any call to Apollo. The only Apollo-facing contract it carries is this read path — consume the guidance, apply it, discard it. Cache lifetime naturally scopes to the request because L3 only acts inside a user-request context; there is no background state to maintain between requests.
Parallax is deferred from Phase 1; when it onboards, its MCP handler follows the same pattern (argument pop + request-scoped cache contextvar).
Oracle → Apollo (L3 observation)
Happens after the outbound MCP dispatch completes, still inside ToolExecutor.run. See the tool-use loop for the wrapping; the actual emit is:
oracle.hooks.chat.emit_tool_output(
trace_id=<from ContextVar>,
conversation_id=...,
token_payload=token_payload,
service_name=tool_info.service_name, # e.g., "cortex"
tool_name=tool_name,
arguments=<original tool_args, apollo_guidance stripped>,
output=result_text,
latency_ms=...)
│
└─▶ ingest.ingest(envelope)
Why service_name is the L3 target, not "oracle": per the oracle-sole-observer design, oracle is the actual emitter, but the envelope records what was observed. The Evaluator (M10) keys on this field to apply per-service signal gates.
Why apollo_guidance is stripped before recording: Apollo must not observe its own injections as if they were part of the caller's intent. emit_tool_output strips the key explicitly.
Outbound: Oracle → L1
routes.chat(...) continues:
│
├─▶ ConversationStore.append(...) [if conversation_id]
│
├─▶ oracle.hooks.chat.emit_final_response(...) [async, in-process]
│ └─▶ ingest.ingest(envelope type=final_response)
│
├─▶ oracle.guidance.attacher.for_l1(
│ user=token_payload.subject,
│ intent_class=None) [in-process; bounded attach]
│
└─▶ return ChatResponse(response, conversation_id, tool_calls,
model_used, tokens, apollo_guidance=<payload>)
Why: final_response records what actually reached L1 (not what oracle intended — what the envelope carried). apollo_guidance on the response body is the L1 subscription channel; whenever L1's own ApolloGuidanceCache.update(...) is called with this payload, L1's next prompt composition sees the freshest guidance. Oracle's own chat LLM (the tool-use loop that produced this response) consumes the same guidance via the L2 in-process path before assembling the next turn.
Background workers — what runs off the request path
Apollo runs four long-lived tasks inside oracle's event loop. None of them block request handling.
| Task | Started | Does what | Frequency |
|---|---|---|---|
_drain_worker × N |
ingest.startup() (N = APOLLO_INGEST_WORKER_CONCURRENCY, default 4) |
Dequeues envelopes → writes to apollo_observations → runs extractors → upserts graphs → fires SynthesisEngine().schedule() |
Continuously; blocks on _queue.get() |
run_periodic (snapshot loop) |
oracle.app.startup() |
Serializes the in-memory graph_set → apollo_graph_snapshots |
APOLLO_GRAPH_SNAPSHOT_INTERVAL (default 3600s) |
SynthesisEngine per-trace task |
SynthesisEngine.schedule(envelope) |
Pulls subgraph → builds prompt → calls LLM → runs drift checks → appends to _pending |
Fires per-trace; collapses by trace_id; bounded by APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) |
ApolloClient._timer_loop |
ApolloClient.start() |
Periodic flush of the per-auth-token buffer | APOLLO_INGEST_FLUSH_INTERVAL_MS (default 500ms) — secondary path only |
Drain worker call graph (inside ingest._drain_worker)
loop:
envelope = await _queue.get()
│
├─▶ if _is_duplicate(envelope): # (trace_id, event_type, timestamp, service) within window
│ metrics.INGEST_DEDUPE_TOTAL.inc()
│ _queue.task_done()
│ continue
│
├─▶ await _write_with_retry(envelope) (ES write through ApolloObservations UDS store)
│
├─▶ extractors_module.apply(envelope, graph_set) [synchronous; pure Python]
│ └─▶ for each of 5 graphs: upsert nodes + edges + EWMAs, mark dirty
│
├─▶ nodes, edges = graph_set.drain_all_dirty()
├─▶ if (nodes or edges) and _graph_writer:
│ await _graph_writer(nodes, edges) (ES upserts)
│
├─▶ metrics.INGEST_LAST_INGEST_TS.labels(service=…).set(time.time())
│
└─▶ try:
SynthesisEngine().schedule(envelope) [M8; fires LLM pass if event type triggers]
except: log + continue (never wedges ingest)
Why this order: persistence before graph updates before synthesis. If a replay from Elastic ever re-runs extractors over stored observations in arrival order, the resulting graph state will be byte-identical to the original.
Synthesis coalescing
SynthesisEngine.schedule(envelope):
│
├─▶ flavor = _SYNTHESIS_FLAVOR.get(envelope.event_type)
│ if flavor is None: return None (llm_turn, final_response skip)
│
├─▶ self._latest_by_trace[trace_id] = envelope (always record latest)
│
├─▶ if trace_id in self._in_flight: return None (already running — follow-up pass
│ will pick up the newer envelope)
├─▶ self._in_flight.add(trace_id)
└─▶ return asyncio.create_task(self._run_trace(trace_id, flavor))
_run_trace(trace_id, flavor):
│
├─▶ async with self._sem: (bounded concurrency)
│ envelope = self._latest_by_trace.pop(trace_id, None)
│ if not envelope: return
│ await self._synthesize_from_envelope(envelope, flavor)
│
└─▶ finally:
self._in_flight.discard(trace_id)
if trace_id in self._latest_by_trace: (arrived during run → re-queue)
asyncio.create_task(self._run_trace(...))
Why: a burst of three tool_error events on one trace produces one LLM call, not three. The latest envelope wins — that's the one carrying the most recent state.
The two ingest paths — primary vs secondary
| Path | Caller | Mechanism | Auth | Used by |
|---|---|---|---|---|
| Primary (in-process) | Oracle's own code (routes.chat, ToolExecutor, mcp/server._proxy) |
Direct Python function call: oracle.hooks.chat.emit_*(...) → ingest.ingest(envelope) |
None — authenticated at ingress by OAuthMiddleware |
Every Phase-1 event type from oracle + cortex (parallax deferred) |
| Secondary (HTTP POST) | ApolloClient in axonis-core (admin replay / out-of-process emitters) |
POST /api/v1/apollo/observations with Bearer token + traceparent header; goes through the FastAPI handler which delegates to ingest.ingest() |
Bearer token required | Admin replay/seed; future services oracle can't observe via MCP |
Both paths end on the same queue and the same worker pool. The only difference is how the envelope arrives.
Secondary path auth
POST /api/v1/apollo/observations
│
├─▶ TraceparentMiddleware (sets ContextVar)
├─▶ OAuthMiddleware (validates Bearer → request.state.token_payload)
│
└─▶ guidance.api.post_observations(request):
│
├─▶ payload = request.state.token_payload
├─▶ caller = _caller_identity_from_token(payload)
│
├─▶ body = await request.json()
├─▶ batch = ObservationBatch.model_validate(body) (400 on malformed)
│
└─▶ for envelope in batch.observations:
if envelope.caller_identity.username is None:
envelope.caller_identity = caller (stamp from token)
await ingest.ingest(envelope)
Guidance attachment — the two attach points
L1 attach (/chat response)
When: at the bottom of every routes.chat handler, just before constructing ChatResponse.
Who calls: oracle (in-process).
How:
oracle.guidance.attacher.for_l1(user=<subject>, intent_class=<class or None>)
│
├─▶ if not APOLLO_GUIDANCE_ATTACH_ENABLED: return None
│
├─▶ await asyncio.wait_for(
│ oracle.guidance.selectors.match_artifacts(...),
│ timeout=APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS / 1000)
│
├─▶ on timeout:
│ metrics.GUIDANCE_ATTACH_TIMEOUT_TOTAL.labels(scope="l1").inc()
│ return None (response serializes without the field)
│
└─▶ return {"as_of": now(), "artifacts": [...], "rationale_summary": "..."}
L3 attach (MCP dispatch)
When: inside _call_backend_tool before the outbound httpx POST, and inside the MCP proxy's _proxy before the dispatch.
Who calls: oracle (in-process).
How: same shape as L1 but scope-tagged as l3:<service>. Skipped entirely when tool_info.component_kind == "library".
Subscriber consumption
Subscribers don't call Apollo. They call their own local ApolloGuidanceCache.update(payload) when they receive an envelope with apollo_guidance. The cache is a pure in-process data structure; no network, no auth, no transport.
Synthesis trigger graph
Which observation types trigger which LLM flavor:
| Event type | Flavor | When | Who triggers | Prompt builder |
|---|---|---|---|---|
user_prompt |
intent_pattern |
L1 sends /chat | Drain worker post-graph-update | build_intent_pattern_prompt |
intent_schema |
intent_pattern |
L1 includes schema block | Drain worker | build_intent_pattern_prompt |
tool_output |
intent_pattern |
L3 responds successfully | Drain worker | build_intent_pattern_prompt (strengthens clustering) |
tool_error |
failure_pattern |
L3 returns an error | Drain worker | build_failure_pattern_prompt |
llm_turn |
(no trigger) | Every LLM cycle inside oracle | — | Too granular; feeds graphs only |
final_response |
(no trigger) | Response about to reach L1 | — | Informational; feeds graphs + lineage |
user_feedback |
(future) | Admin endpoint or feedback submission | — | Evaluator-only today |
| Admin-initiated | failure_pattern |
Admin hits POST /learn |
admin.api.trigger_learn |
build_failure_pattern_prompt |
Why llm_turn and final_response don't trigger: they're intermediate observations. Every request produces exactly one user_prompt → many llm_turn → many tool_output/tool_error → one final_response. Triggering on the high-cardinality ones would LLM-thrash; triggering on the bracket events gives clean once-per-turn cadence.
Drift gating — the four checks
Every LLM proposal passes through drift.run_all(...) before it can become an approved artifact. The call path:
SynthesisEngine._record_proposal(proposal, subgraph)
│
└─▶ drift_module.run_all(
proposal_id=<uuid>,
proposal=<LLM's JSON output>,
outcome_graph_edges=subgraph["outcome_graph_edges"],
intent_graph_nodes=subgraph["intent_graph_nodes"],
existing_weights=subgraph.get("existing_weights", []),
trajectory=subgraph.get("trajectory"))
│
├─▶ check_proposed_pattern_vs_edges(...) # n/a if non-FailurePattern
├─▶ check_intent_classification_vs_clusters(...) # n/a if non-IntentPattern
├─▶ check_weight_swings(...) # passes if <2 priors
└─▶ check_trajectory_coherence(...) # passes if no trajectory
│
└─▶ DriftCheckResult(approved = all passed)
Why deterministic: drift checks must not themselves call the LLM — that would defeat the point. They are pure math against graph state.
What happens on failure: the proposal is still recorded on the pending list, but status="drift_flagged" and drift_checks[*] carries per-check detail so admins can see exactly which anchor was violated. M9's Curator will refuse to commit flagged proposals autonomously (admin review required).
Admin call paths
Every admin endpoint routes through the same require_admin FastAPI dependency.
admin.api.require_admin(request) -> payload
│
├─▶ payload = request.state.token_payload (from OAuthMiddleware)
├─▶ if "admin" not in payload.roles: raise 403
└─▶ return payload
Endpoint call graph
| Endpoint | Handler | Calls | Returns |
|---|---|---|---|
GET /memories |
admin.api.list_memories |
ApolloObservations().read(query) |
Filtered observation list |
GET /memories/{uid} |
admin.api.get_memory |
ApolloObservations().read(uid=uid) |
One doc or 404 |
POST /memories |
admin.api.seed_memory |
ingest.ingest(envelope) (primary path, admin-stamped caller_identity) |
{accepted, trace_id, event_type} |
PATCH /memories/{uid} |
admin.api.patch_memory |
ApolloObservations().update({tags, admin_note}, uid) |
{patched, uid, fields} |
DELETE /memories/{uid} |
admin.api.forget_memory |
ApolloObservations().delete(uid) |
{forgotten, uid} |
GET /artifacts |
admin.api.list_artifacts |
ApolloArtifacts().read(match_all) + SynthesisEngine().pending_snapshot() |
{active: [...], pending: [...], count: {active, pending}} |
GET /audit |
admin.api.list_audit |
ApolloAudit().read(query) with term filters on action/actor/artifact_id + timestamp range |
Audit records sorted timestamp-desc |
GET /recommendations |
admin.api.list_recommendations |
RecommendationQueue().snapshot() |
{recommendations: [...], count: N} |
POST /artifacts/{id}/promote |
admin.api.promote_artifact |
Fetches proposal from SynthesisEngine._pending → curator.promote(...) |
ActionResult.to_dict() |
POST /artifacts/{id}/demote |
admin.api.demote_artifact |
curator.demote(...) with optional evaluator_score / score_decomposition / upstream_artifact_ids threaded into audit |
ActionResult.to_dict() |
POST /artifacts/{id}/rollback |
admin.api.rollback_artifact |
curator.rollback(artifact_id, target_version, ...) |
ActionResult.to_dict() |
PATCH /artifacts/{id} |
admin.api.edit_artifact |
curator.edit(artifact_id, content_patch, applicability_patch, ...) |
ActionResult.to_dict() |
DELETE /artifacts/{id} |
admin.api.forget_artifact |
curator.forget(artifact_id, ...) |
ActionResult.to_dict() |
GET /guidance?scope=l1 |
admin.api.preview_guidance |
attacher.for_l1(...) |
Current attachable payload for that scope |
GET /guidance?scope=l3:<svc> |
admin.api.preview_guidance |
attacher.for_l3_agent(service_name=..., tool_name=None) |
Current L3 payload |
GET /subscribers |
admin.api.list_subscribers |
SSEHub().subscribers_snapshot() |
Connected SSE clients |
GET /guidance/stream |
admin.api.guidance_stream |
SSEHub().subscribe(scope) then streams via sse_event_stream |
text/event-stream |
POST /learn |
admin.api.trigger_learn |
SynthesisEngine().schedule_admin_initiated(scope=...) |
{accepted: true, scope} |
POST /chat (admin, action: invoke) |
chat.server.admin_chat |
chat.tools.TOOL_IMPLEMENTATIONS[tool](**arguments) |
Tool result |
POST /chat (admin, action: chat) |
chat.server._run_chat_loop |
Apollo LLM loop → TOOL_IMPLEMENTATIONS[picked_tool](**args) per iteration → final prose |
{response, tool_calls, iterations, conversation_id} |
POST /chat (admin, action: list_tools) |
chat.server.admin_chat |
chat.tools.TOOL_DEFINITIONS |
Tool catalog for UI rendering |
GET /audit filter params (M13) |
admin.api.list_audit |
Adds trigger (term) + artifact_type (prefix on artifact_id) |
Filtered records sorted timestamp-desc |
GET /stats (M13 polish) |
guidance.api.get_stats |
Returns metric snapshot + degraded_emitters (via maintenance.degraded_emitters()) + intent_schema_coverage (via maintenance.intent_schema_coverage(24)) |
Status + metrics + degraded list + coverage fraction |
POST /observations |
guidance.api.post_observations |
ingest.ingest(envelope) per item |
{accepted, dropped} |
GET /stats |
guidance.api.get_stats |
metrics.snapshot() |
Every Prometheus counter's current values |
SSE fan-out (driven by Curator commits from M9+)
curator.actions.promote/demote/forget/edit/rollback(...)
│
├─▶ allow_or_raise(ActionRequest) (policy gate)
├─▶ _copy_current_to_history(artifact_id) (prior version → apollo_artifact_history)
├─▶ ApolloArtifacts.create(new_record, uid=artifact_id) (store mutation)
├─▶ write_audit(ApolloAuditRecord(...)) (apollo_audit)
│
└─▶ _broadcast_commit(action, artifact_id, actor)
│
└─▶ SSEHub().broadcast(
{event: "curator_commit", action, artifact_id, actor, ts},
scope="*")
│
└─▶ for sub in self._subs.values():
if scope matches or sub.scope == "*":
sub.queue.put_nowait(event)
↳ if QueueFull: drop (slow consumer reconnects)
Subscribed admin client's streaming response pulls from sub.queue and yields text/event-stream bytes. SSE broadcast failures are swallowed by _broadcast_commit — the durable state (store + audit) is what matters; the SSE feed is cosmetic.
Evaluator call graph (Milestone 10)
Every qualifying observation that the drain worker processes runs through this pipeline before the worker moves on:
drain_worker
│ (after extractors + graph update)
│
└─▶ _evaluate_envelope(envelope)
│
├─▶ AttributionRegistry.applied_for(trace_id, service_name=...)
│ └─▶ returns [artifact_id, ...] that were attached to this trace
│ (empty when the attacher never recorded anything for the
│ trace — common pre-M9 / during bootstrap; skip early)
│
├─▶ signals.detect_signals(envelope, applied_artifact_ids=...)
│ │
│ ├─ TOOL_ERROR → L3_ERROR signals
│ ├─ TOOL_OUTPUT → SCHEMA_MISMATCH (agent-only) +
│ │ EVALUATOR_CONFIDENCE (if gap > 0)
│ ├─ USER_FEEDBACK → USER_FEEDBACK (negative sentiments)
│ └─ FINAL_RESPONSE + gap → EVALUATOR_CONFIDENCE
│
│ Library services are DARK for SCHEMA_MISMATCH (Q9).
│ component_kind is looked up via ServiceRegistry at signal time.
│
└─▶ for each SignalHit:
├─▶ ScoringEngine.apply_signal(artifact_id, kind, magnitude)
│ └─▶ EMA pull toward per-tier asymptote
│ increments signal_counts / magnitude_totals
│ tracks l3_dominant_ticks + ticks_below_threshold
│ appends to l3_timestamps ring buffer
│
├─▶ cascade.cascade_on_l3_dominant(engine, artifact_id)
│ └─▶ CascadeOutcome with action in
│ {none, drift_event, recommend_fast_demote, recommend_demote}
│ and `upstream_flagged` list
│
└─▶ if outcome.action != "none":
RecommendationQueue.add(
Recommendation(
artifact_id,
kind={"recommend_demote": "demote",
"recommend_fast_demote": "fast_demote",
"drift_event": "drift_event"}[action],
reason=outcome.reason,
evaluator_score=score.score,
score_decomposition=score.decomposition(),
upstream_artifact_ids=outcome.upstream_flagged,
))
audit.info("oracle.evaluator.recommendation ...")
Queue replace-semantics. A fresh recommendation for an artifact replaces any prior recommendation for the same artifact — the queue holds the latest high-water snapshot, not an append log. Admins always see the current score, not a stale one.
Cleared by action. When an admin calls curator.demote() with a recommendation's evaluator fields, the demote action removes the recommendation from the queue. No duplicate recommendations appear after the admin acts.
Admin-chat LLM loop (Milestone 11)
Natural-language admin prompt → LLM picks tool → tool runs → LLM reads result → LLM composes final prose. The loop is bounded at _MAX_CHAT_ITERATIONS = 6 so a misbehaving LLM can't thrash.
POST /chat {"action": "chat", "message": "why did you demote pshim_xyz?"}
│
└─▶ chat.server._run_chat_loop(message, actor)
│
├─▶ messages = [{"role": "user", "content": "Tool catalog:\n... Admin message: why did you demote pshim_xyz?"}]
│
└─▶ for iteration in range(_MAX_CHAT_ITERATIONS):
│
├─▶ LLMClient.get().complete(system=_SYSTEM_PROMPT, messages, response_format="json")
│ └─▶ returns LLMResponse with parsed JSON
│
├─▶ parsed["action"] == "call_tool":
│ ├─▶ impl = TOOL_IMPLEMENTATIONS[parsed["tool"]]
│ ├─▶ result = impl(actor="admin:<username>", **parsed["arguments"])
│ │ │ (mutations raise CuratorPaused if pause flag is on →
│ │ │ caught and surfaced as {"ok": False, "error": "curator_paused"})
│ ├─▶ tool_trail.append({"tool": ..., "result": result})
│ ├─▶ messages.append(assistant_turn + tool_result_turn)
│ └─▶ continue loop
│
├─▶ parsed["action"] == "respond":
│ └─▶ return {"response": parsed["text"], "tool_calls": tool_trail, ...}
│
└─▶ malformed JSON → append nudge, try again until budget exhausts
Every mutation picked by the LLM flows through the M9 Curator atomic sequence (policy gate → history → store → audit → SSE broadcast), using trigger="admin_chat" on the audit record to distinguish chat-driven actions from direct REST calls.
Curator pause gate (Milestone 11)
oracle.curator.pause._state: PauseState
│
├─▶ set_paused(actor, rationale)
│ ├─▶ writes ApolloAuditRecord(action=PAUSE_CURATOR, indefinite=True)
│ └─▶ SSEHub().broadcast({event: curator_paused, by, reason, ts}, scope="*")
│
├─▶ clear_paused(actor, rationale)
│ ├─▶ writes ApolloAuditRecord(action=RESUME_CURATOR, indefinite=True,
│ │ evidence_ref={prior_paused_by, prior_paused_reason})
│ └─▶ SSEHub().broadcast({event: curator_resumed, by, ts}, scope="*")
│
└─▶ raise_if_paused()
└─▶ called at the TOP of every mutation in apollo/curator/actions.py
plus rollback_graph and trigger_synthesis in apollo/chat/tools.py
raises CuratorPaused(state) when flag is on
Mutation coverage. promote, demote, forget, edit, rollback (curator); rollback_graph, trigger_synthesis (chat tools). resume_curator itself is intentionally NOT gated — that's how admins get out of pause.
Autonomous Curator sweep (Milestone 12)
A background task started from oracle.app.startup runs every APOLLO_CURATOR_AUTO_INTERVAL_SEC (default 30) seconds. On each tick:
oracle.curator.auto.sweep_once()
│
├─▶ if not settings.APOLLO_CURATOR_AUTONOMOUS: return {ran: False, reason: "disabled"}
├─▶ if is_paused(): return {ran: False, reason: "paused"}
│
├─▶ for rec in SynthesisEngine().pending_snapshot():
│ if rec.status == "approved":
│ artifact_id = derive_artifact_id(rec.proposal)
│ ├─▶ fp_<hash> for FailurePattern, ip_<hash> for IntentPattern,
│ │ ps_<hash> for PromptShim, sf_<hash> for SpecFragment
│ │ — same proposal body → same id (versioning accumulates)
│ └─▶ curator.promote(artifact_id=..., actor="curator_auto",
│ trigger="autonomous_curator", evidence_ref={autonomous: True})
│ └─▶ same M9 atomic sequence — raise_if_paused → policy_gate →
│ history → store → audit → SSE broadcast
│ else rec.status == "drift_flagged":
│ drift_retained += 1 (admin must review)
│
└─▶ for queued in RecommendationQueue().snapshot():
if queued.kind in ("demote", "fast_demote"):
curator.demote(artifact_id=..., actor="curator_auto",
trigger="autonomous_curator",
evaluator_score, score_decomposition,
upstream_artifact_ids)
elif queued.kind == "drift_event":
drift_retained += 1 (admin must review)
Evolution-class vs drift-class. Evolution work (approved proposals + demote/fast_demote recommendations) auto-commits because it's mechanical — the drift check already validated the proposal; the evaluator already quantified the regression. Drift-class work (drift_flagged proposals + drift_event recommendations) is explicitly retained for admin review because it reflects a divergence the autonomous path shouldn't resolve on its own.
Mid-sweep pause. If an admin pauses between the sweep's flag check and one of its mutations, the individual promote / demote call raises CuratorPaused; sweep_once catches it and returns reason: "paused_mid_sweep" with whatever it committed up to that point. No split-brain state.
Maintenance loop (Milestone 13)
A second background task from oracle.app.startup runs every APOLLO_MAINTENANCE_INTERVAL (default 1 hour):
oracle.maintenance.run_once()
│
├─▶ _purge_expired()
│ └─▶ for alias in (apollo_observations, apollo_audit, apollo_graph_snapshots):
│ store.delete_by_query({"range": {"expires_ts": {"lt": now}}})
│ — indefinite records have expires_ts=null → skipped by the range query
│
├─▶ _coarsen_snapshots() # tier generation deferred; hook in place
│
└─▶ _emit_metrics()
├─▶ MAINTENANCE_LAST_RUN_TS.set(now)
└─▶ MAINTENANCE_DOCS_DELETED_TOTAL.labels(index=alias).inc(n)
/stats read-side helpers:
oracle.maintenance.degraded_emitters()
└─▶ scan metrics.INGEST_LAST_INGEST_TS samples
flag any service whose timestamp is > APOLLO_INGEST_STALE_WARN_SEC old
return [{"service", "last_ingest_ts", "seconds_since"}, ...]
oracle.maintenance.intent_schema_coverage(window_hours=24)
└─▶ count observations in the window; count intent_schema within them
return count_intent / count_all (None when window is empty)
Curator mutation atomic sequence
Every action in oracle.curator.actions follows the same five-step shape:
1. allow_or_raise(request) # policy gate — fails closed
2. _copy_current_to_history(artifact_id) # prior → apollo_artifact_history
3. ApolloArtifacts.create(new_record) # overwrite current with new version
4. write_audit(ApolloAuditRecord(...)) # audit record with required rationale
5. _broadcast_commit(action, ...) # SSE fan-out to admin subscribers
If step 1 raises, nothing downstream runs. If any later step fails, the partial state is what operators will see on GET /artifacts + GET /audit — but the policy gate guarantees that a blocked action leaves zero durable state. In practice, steps 2–4 are tightly coupled and written atomically against the same event loop iteration; an ES outage at step 3 will still have written step 2's history record (acceptable — history is designed to accumulate).
Trace propagation — who sets, who reads
Set (ingress)
TraceparentMiddleware:
│
├─▶ header = scope["headers"]["traceparent"]
├─▶ ctx = parse_traceparent(header)
│
├─▶ if ctx is None:
│ if APOLLO_REQUIRE_TRACEPARENT: return 400
│ if header: metrics.MALFORMED_TRACEPARENT_TOTAL.inc()
│ else: metrics.MISSING_TRACEPARENT_TOTAL.inc()
│ ctx = mint_traceparent() (local mint)
│
└─▶ set_current_traceparent(ctx.format())
Read (outbound)
Every place that needs the trace reads the same ContextVar:
| Call site | Purpose |
|---|---|
routes.chat |
Pass trace_id to emit_user_prompt / emit_final_response |
ToolExecutor emit loop |
Pass trace_id to emit_llm_turn / emit_tool_output / emit_tool_error |
_call_backend_tool headers |
Add traceparent header to outbound httpx POST |
mcp/server._proxy headers |
Same |
extract_http_headers |
Forward to any gateway client (MCPClient / RestClient) |
ApolloClient._post_with_retry |
Stamp traceparent header on secondary-path POSTs |
Every envelope oracle emits for one request carries the same trace_id. That's what lets the admin lineage query stitch observations across layers.
Logging + auditing — three channels
Every Apollo module imports these three loggers from axonis.core.logger:
| Logger | Purpose | Rotating file |
|---|---|---|
log |
Routine telemetry (info, warning, debug) |
oracle.log |
error |
Exceptions, permanent failures, data-loss events | error.log |
audit |
Important transactions that must be independently traceable | audit.log |
What counts as audit.info():
| Event | Emitted from |
|---|---|
oracle.admin.memory_seeded / memory_patched / memory_forgotten |
M7 admin mutations |
oracle.admin.learn_requested |
M8 POST /learn |
oracle.chat.list_tools / oracle.chat.invoke |
M7 admin chat |
oracle.synthesis.admin_initiated |
M8 admin-driven synthesis pass |
oracle.synthesis.proposal_recorded |
M8 every proposal (approved + drift_flagged) |
oracle.llm.minimax_local_loaded |
M8 first successful load of the local HF checkpoint |
oracle.curator.audit action=<kind> actor=... artifact=... |
M9 every Curator mutation (promote / demote / forget / edit / rollback); one line per audit record written |
oracle.curator.promoted / .demoted / .forgotten / .edited / .rolled_back |
M9 mutation-kind-specific info lines (operational telemetry, not primary audit) |
oracle.evaluator.recommendation artifact=<id> kind=<demote\|fast_demote\|drift_event> score=<float> reason=<str> |
M10 — every evaluator recommendation landing on the queue |
oracle.chat.session_start actor=admin:<name> conv=<id> / session_end / session_timeout |
M11 — admin chat session lifecycle |
oracle.chat.tool_call actor=admin:<name> tool=<name> iter=<n> |
M11 — every tool the LLM picks inside a chat loop |
oracle.chat.trigger_synthesis / .rollback_graph / .list_tools / .invoke |
M11 — admin-chat operation audit lines |
oracle.curator.audit action=pause_curator \| resume_curator artifact=curator:state |
M11 — pause/resume indefinite audit records |
oracle.curator_auto.promoted artifact=<id> version=<n> proposal=<id> |
M12 — every autonomous promote |
oracle.curator_auto.demoted artifact=<id> kind=<demote\|fast_demote> score=<float> |
M12 — every autonomous demote from evaluator recommendation |
oracle.curator_auto.sweep promoted=<n> demoted=<n> drift_retained=<n> |
M12 — per-sweep summary (only logged when anything changed) |
oracle.maintenance.completed purged=<dict> coarsened=<dict> |
M13 — every maintenance pass |
Telemetry — who increments what
All counters are registered at startup with zero values (so dashboards work before traffic). Every counter has labels by service/event type/kind where applicable.
| Counter | Incremented by | When |
|---|---|---|
apollo_ingest_accepted_total{service, event_type} |
ingest.ingest() |
Envelope successfully enqueued |
apollo_ingest_queue_dropped_total{service} |
ingest.ingest() |
Queue full on put_nowait |
apollo_ingest_dedupe_total{service} |
Drain worker | Observation inside dedup window |
apollo_ingest_queue_depth |
Drain worker | Updated after every get |
apollo_ingest_last_ingest_ts{service} |
ingest.ingest() |
On successful enqueue (both paths) |
apollo_ingest_last_drain_ts{service} |
Drain worker | After graph update step |
apollo_ingest_post_failure_total{service, kind} |
ApolloClient._post_with_retry |
Secondary-path POST failed after retries |
apollo_ingest_worker_failure_total{service} |
Drain worker | Write/graph path failed after retries |
apollo_guidance_attach_timeout_total{scope} |
attacher.for_l1 / for_l3_agent |
Attach budget overshoot |
apollo_missing_traceparent_total |
TraceparentMiddleware |
No header on inbound request |
apollo_malformed_traceparent_total |
TraceparentMiddleware |
Header present but unparseable |
M9 additions — the Curator does not introduce new Prometheus counters in M9; visibility into mutations is through the audit log + SSE feed + admin endpoints.
M13 additions:
| Counter | Incremented by | When |
|---|---|---|
apollo_maintenance_last_run_ts |
maintenance._emit_metrics |
End of each maintenance pass |
apollo_maintenance_docs_deleted_total{index} |
maintenance._emit_metrics |
After each delete_by_query sweep |
Counters are scraped through GET /api/v1/apollo/stats (JSON) and /api/v1/oracle/metrics (Prometheus text format via oracle's existing endpoint).
Failure modes — what degrades how
| Failure | Apollo effect | L1 / L3 effect |
|---|---|---|
| Ingest queue full | apollo_ingest_queue_dropped_total increments; observation lost |
None — request continues |
| Drain worker crashes mid-write | At-least-once retry; after budget, dead-letter JSONL (optional) + counter | None — request already returned |
| Apollo module import fails at startup | Oracle serves /chat without apollo_guidance on the response; MCP dispatches go out without the guidance field; POST /observations returns 503 |
L1 sees response without the optional field; L3 gets no guidance but otherwise runs normally |
attacher.for_l1 / for_l3_agent exceeds timeout |
apollo_guidance_attach_timeout_total increments; field omitted from response/dispatch |
Same — clients ignore missing optional field |
| LLM call fails inside synthesis | Proposal not recorded; error.error("oracle.synthesis.trace_failed") logged |
None — synthesis is background work |
| LLM returns malformed JSON | Proposal dropped silently from pending list; error.error("oracle.synthesis.bad_json") |
None |
| Drift check flags proposal | status="drift_flagged" on pending record; admin sees it under GET /artifacts |
None |
OAuthMiddleware rejects token |
401 returned | Standard oracle behavior |
TraceparentMiddleware in required mode + missing header |
400 returned before handler | L1 must include traceparent to proceed |
Invariant: no failure in Apollo can reach the user's /chat response path with anything worse than "the optional apollo_guidance field isn't there."
What's not wired yet (by milestone)
| Feature | Unblocks | Milestone |
|---|---|---|
Curator persists approved proposals to apollo_artifacts with version history |
Artifact-driven guidance instead of empty sets | ✅ M9 |
Admin PATCH /artifacts/{id} / promote / demote / rollback |
Admin can act on Apollo's proposals | ✅ M9 |
apollo_audit records with rationale + evidence_ref |
Audit review surface | ✅ M9 |
| SSE fan-out on Curator commits | Live admin visibility into mutations | ✅ M9 |
| Evaluator scoring + L3-amplified demotion | Artifacts decay when they stop correlating | ✅ M10 |
GET /api/v1/apollo/recommendations for admin review of evaluator verdicts |
Admin can see which artifacts the evaluator wants demoted | ✅ M10 |
Demote audit records carry evaluator_score + score_decomposition + upstream_artifact_ids |
Audit trail explains why each demote happened | ✅ M10 |
| Admin-chat LLM loop with function-calling-style tool selection | Conversational admin surface | ✅ M11 |
| Full admin tool catalog (promote/demote/rollback/forget/edit/graph rollback/synthesis trigger/pause/resume) | Every admin mutation reachable via chat | ✅ M11 |
explain_decision / discuss_decision surface audit + evidence in-chat |
Admin can ask "why did you do this?" in plain English | ✅ M11 |
| Curator pause gate on every mutation | Emergency off-switch | ✅ M11 |
| Autonomous Curator auto-commit loop (evolution-class) | Curator commits without admin intervention | ✅ M12 |
| Deterministic artifact-id derivation for auto-promoted proposals | Versioning accumulates on one artifact instead of proliferating | ✅ M12 |
Hourly maintenance + delete_by_query purge |
Expired observations/audits cleaned up | ✅ M13 |
/stats degraded_emitters + intent_schema_coverage |
Phase-3 readiness (required-mode flip substrate) | ✅ M13 |
GET /audit extended with trigger + artifact_type filters |
Admin can narrow by mutation origin and artifact kind | ✅ M13 |
Cortex (L3) consumes apollo_guidance via request-scoped ApolloGuidanceCache |
L3 LLM calls inside cortex tools observably change when guidance is attached | ✅ M14 |
apollo_guidance popped from MCP arguments before tool dispatch; contextvar reset on return / exception |
No cross-request leakage of cache state | ✅ M14 |
Beacon (L1) consumes apollo_guidance via session-scoped ApolloGuidanceCache |
L1 LLM prompts include attached PromptShims / FailurePatterns / etc. | ⏸ Deferred — gated on a beacon↔oracle connection (no path today) |
| Parallax onboarding (subscriber + emitter pattern, mirrors cortex) | Parallax-driven workflows visible to + steered by Apollo | ⏸ Deferred — same pattern as cortex when it onboards |
APOLLO_LLM_LOCAL_MODEL_PATH + thread-pool offload + device mapping for minimax-local |
Production-grade local inference | Deferred (post-M14) |
Design Journey
A human-readable walk-through of Apollo's design and what each completed milestone delivered. Written for presentation audiences, not as a reference spec. The full technical contract is the rest of this spec; the build order lives in §Implementation Plan. This section is a narrative: what we are trying to accomplish, how the pieces fit together, and the order in which they came online.
What Apollo is
Apollo is an observation, learning, and guidance layer that lives inside oracle. It watches every request/response flowing through the platform, records what actually happened, reasons about what is working and what isn't, and feeds that reasoning back into the system as guidance — attached to the next response or dispatch, so the guidance reaches the LLMs that need it at the exact moment they need it.
Three goals:
- Observe. Record every meaningful event in the platform with enough lineage that an operator or an automated evaluator can reconstruct "what happened on this request" days or weeks later.
- Ground. Turn that stream of events into deterministic graphs — no LLM reasoning, just accounting — so the system has an objective ledger of reality.
- Advise. Reason over the graphs (with an LLM, bounded by graph-anchor drift checks) to propose improvements to the prompts, tool routing, failure handling, and intent classifications the platform uses — then attach those improvements to the next turn.
Apollo is internal — it has no external surface. Oracle already fronts the platform; Apollo runs as a package inside oracle.
The architectural invariant (the single rule)
The question that drove the most design iteration was: who talks to whom? The answer:
Neither L1 nor L3 ever addresses Apollo directly. Both talk to oracle. Oracle talks to Apollo.
This is invariant #14 in the design spec. Every piece of the system — ingest, guidance delivery, synthesis, admin tooling — respects it. Here is the full flow in six steps:
1. L1 ──────────────▶ Oracle (user's /chat request)
2. Oracle ──▶ Apollo (in-process emit: user_prompt, intent_schema)
3. Oracle ◀── Apollo (in-process: apollo_guidance payload)
4. Oracle ──────────────▶ L3 (MCP dispatch + apollo_guidance)
5. Oracle ◀────────────── L3 (tool response)
Oracle ──▶ L1 (/chat response + apollo_guidance)
6. Oracle ──▶ Apollo (in-process emit: tool_output / tool_error / final_response)
Consequences of this rule:
- Neither L1 (beacon, browser clients) nor L3 (cortex; parallax in a later phase) holds any Apollo credentials, endpoint knowledge, or client code.
ApolloClient(the HTTP emitter in axonis-core) exists but is reserved for admin replay/seed and out-of-process emitters — services running outside oracle's MCP-dispatch reach. Phase-1 emitters (oracle + cortex) never use it.- Apollo cannot be "down" independently of oracle — they share a process. If Apollo fails, oracle degrades gracefully (responses serialize without
apollo_guidance); if oracle is down, Apollo is moot anyway.
The layered view
┌──────────────────────────────────────────────────────────────┐
│ Layer 1 (L1): Front-facing UI / clients │
│ e.g., beacon, browser clients │
│ - composes prompts, presents responses │
│ - consumes apollo_guidance attached to /chat responses │
│ - never talks to Apollo │
└────────────────────┬─────────────────────────────────────────┘
│ /chat (HTTP + traceparent)
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 2 (L2): Oracle + Apollo │
│ Oracle: auth, routing, LLM dispatch, tool aggregation, │
│ guidance attachment │
│ Apollo: observe, ground (graphs), advise (LLM synthesis), │
│ curate (admin-driven today; autonomous later) │
└────────────────────┬─────────────────────────────────────────┘
│ MCP tool calls + apollo_guidance
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 3 (L3): Backend agents + libraries │
│ agents: parallax, cortex (own their own LLM) │
│ libraries: UDS, athena (no LLM, pure compute/IO) │
│ - agents consume apollo_guidance from MCP arguments │
│ - libraries receive no guidance (oracle filters) │
│ - never talks to Apollo — oracle observes the round-trip │
│ and emits on each service's behalf │
└──────────────────────────────────────────────────────────────┘
Milestone journey
The build is ordered so every milestone ships a coherent, merge-ready slice. Oracle remains fully functional throughout — Apollo is additive. Each milestone below records what shipped, what it proved, and why it matters; the canonical build-order contract for the same milestones is in §Implementation Plan.
Milestone 0 — Package scaffolding
What shipped. The oracle/oracle/ directory tree with stub modules matching the design spec's package structure. Apollo mounted into oracle's Starlette app at /api/v1/apollo/*. A single live endpoint: GET /api/v1/apollo/stats returning a bootstrap placeholder. Dependency additions (sentence-transformers, numpy) landed.
What it proved. Apollo can be loaded, mounted, and reached without breaking any pre-existing oracle functionality.
Why it matters. Everything later in the plan mounts on this scaffolding. By separating "wire the skeleton" from "build the brain," each subsequent milestone is a narrow PR instead of a sprawling rewrite.
Milestone 1 — Observation intake
What shipped. The primary in-process ingest path: oracle.oracle.observer.ingest.ingest(envelope) validates a Pydantic envelope and drops it on a bounded asyncio.Queue. A pool of background workers drains the queue, writes to Elasticsearch's apollo_observations index, and dedupes by (trace_id, event_type, timestamp, service) across a configurable window. A secondary HTTP path — POST /api/v1/apollo/observations — wraps the same queue so admin replay and future out-of-process emitters have a route. ApolloClient in axonis-core handles the secondary-path client side (batching + retry + flush on shutdown).
What it proved. Apollo can receive observations from oracle's internals and from the network, enqueue them without blocking the request path, and persist them durably. A 50-envelope batch returns 202 in under 10 ms; queue-full and worker-crash paths both increment counters rather than silently drop.
Why it matters. The whole rest of Apollo sits on this pipe. Ground truth has to arrive reliably before anything can reason about it.
Milestone 2 — Deterministic graph updates
What shipped. Five Decision Graphs backed by apollo_graph_nodes and apollo_graph_edges: intent_tool, prompt_shape, service_routing, outcome, and iteration. Each observation passes through rule-based extractors that produce (nodes_touched, edges_touched); the graph module upserts idempotently and updates short- and long-window EWMA weights. Hourly snapshots land in apollo_graph_snapshots. An in-memory mirror rebuilds from Elastic on startup so the hot path never hits ES.
What it proved. Every observation produces graph mutations with no LLM call. The math is deterministic — 1,000 synthetic observations reproduce byte-identical graph state on replay. EWMA weights converge to the expected values for a known sequence.
Why it matters. This is the grounding layer. When Apollo's LLM proposes "there's a new failure pattern in parallax's fusion_run_start tool," the graph-anchor check (Milestone 8) validates that claim against observed reality rather than letting the LLM hallucinate.
Milestone 3 — Guidance attach plumbing
What shipped. The apollo_guidance payload shape and the two places oracle attaches it:
- L1 path: oracle's
/chatresponse body carriesapollo_guidancefor every authenticated caller. Placeholder — see note below. - L3 path: oracle's MCP dispatches to
agent-kind services carryapollo_guidanceinside theargumentsdict, same pattern as the existingllm_specinjection.library-kind services are filtered out.
The component_kind field was added to ServiceRegistry so the dispatch path can route on it. An in-process attacher returns {as_of, artifacts, rationale_summary} bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS (default 10 ms); on overshoot the field is simply omitted and the request succeeds.
What it proved. The delivery channel works end-to-end with an empty artifact set. /chat responses carry the field; MCP dispatches carry it; library dispatches don't; attach-timeout paths degrade cleanly.
Why it matters. Symmetric piggybacking — guidance rides the envelopes that were already travelling. No push transport, no long-lived connection, no service-token infrastructure. When synthesis (M8) starts producing real artifacts, the L3 pipe is fully wired (cortex consumes via M14) and the L2 pipe is in-process (oracle's chat LLM consumes via M15 — no transport since oracle hosts Apollo). The L1 attach side is wired on oracle's response; the L1 consumer side (beacon) waits on the L1 ↔ Oracle connection design.
Milestone 4 — Subscriber SDK (ApolloGuidanceCache)
What shipped. A pure-Python in-process cache that L1 and L3 agents use to consume attached guidance. Single mutation API (update(apollo_guidance_block)) and six canonical accessors (get_system_prompt_additions, get_spec_fragments, get_tool_description_overrides, get_tool_pairing_hints, get_active_failure_patterns, get_service_connection_hints). Artifacts ordered by (weight desc, recency desc). Empty-cache fallback returns empty lists / None without blocking.
Lives in axonis-core/axonis/core/apollo/guidance_cache.py — no HTTP client, no transport, no ML dependency — so SPEC-01's "axonis-core has no ML dependencies" rule is preserved.
What it proved. Any agent — L1 UI, L3 agent service — can consume guidance through a single small class with no knowledge of Apollo's internals. Idempotent update, stable ordering, applicability filtering, clean empty-cache behavior.
Why it matters. The delivery protocol is one-way (Apollo → subscriber) and the SDK reflects that — it's a data sink with read accessors, nothing more. Swapping Apollo out entirely would mean the cache stops updating; agents would see empty results and continue working with pre-Apollo behavior.
Milestone 5 — Phase-1 emitter integration (oracle-sole-observer)
What shipped. Oracle became Apollo's sole emitter for every Phase-1 event type. The L3 emission path was redesigned here: cortex carries no Apollo emission code. Oracle observes each MCP round-trip it dispatches to cortex and emits tool_output / tool_error in-process on its behalf. L1 events (user_prompt, final_response, intent_schema, user_feedback) continue to be emitted by oracle in-process from the /chat handler. Oracle's own llm_turn events fire from the tool-executor loop. Parallax was originally part of Phase 1 but has been deferred to a later phase; its emitter and subscriber wiring follow the cortex pattern when it onboards.
Cortex carries exactly one Apollo-facing change at M5: its /service-info registration declares "component_kind": "agent" so oracle knows to attach apollo_guidance on its dispatches. Subscriber-side consumption of that attached guidance lands later, in M14.
What it proved. A full /chat request produces a lineage of observations all emitted by oracle under a single trace_id. Parallax and cortex source trees contain zero imports of ApolloClient or ApolloIntegration and zero lifespan wiring for Apollo.
Why it matters. This operationalized the architectural invariant. Apollo's surface shrinks — fewer client integrations, no service tokens, no per-service emission code to maintain across fleets. Adding a new L3 agent takes one line on its ServiceRegistry record (the component_kind declaration); oracle handles the rest.
Milestone 6 — Trace propagation (W3C traceparent)
What shipped. End-to-end trace stitching via the W3C traceparent header. Four pieces:
- Canonical module (
axonis/core/trace.py) in axonis-core with a parser, minter,ContextVar, andcurrent_trace_id()accessor. Pure Python, ~20 lines of parsing, no OpenTelemetry dependency. - Ingress middleware (
TraceparentMiddleware) sits outsideOAuthMiddlewareon oracle's app. Reads the header, validates it, installs theContextVar. On missing → mints + incrementsapollo_missing_traceparent_total. On malformed → mints + incrementsapollo_malformed_traceparent_total. Required mode (APOLLO_REQUIRE_TRACEPARENT=true) returns 400 on either. - Propagation helper —
axonis_core.gateway.client.extract_http_headers()now forwardstraceparentalongsideAuthorization. Any client that uses this helper (MCPClient, RestClient) inherits forwarding for free. - Outbound injection — oracle's
tool_executor.pyandmcp/server.pyusehttpxdirectly, so they explicitly add thetraceparentheader on every outbound MCP POST.
What it proved. A single /chat request produces observations across every Phase-1 boundary — L1 user_prompt, L2 llm_turn, L3 tool_output / tool_error, L2 final_response — all under the same trace_id. The wire path is L1 → Oracle → L3; Apollo is never on the wire.
Why it matters. Lineage stitches. An operator investigating a failure can follow one request all the way through every layer by querying on a single id. The foundation for the admin inspection surface (M7) and the evaluator's outcome correlation (M10).
Milestone 7 — Admin inspection surface
What shipped. Ten admin-only endpoints mounted under /api/v1/apollo/:
- Memory CRUD: list observations with filters, get one, seed synthetic, patch metadata (tags + admin_note), forget.
- Stubs with stable shapes:
GET /artifacts(empty until M8),GET /audit(empty until M9). - Guidance preview:
GET /guidance?scope=l1or?scope=l3:<service>— returns what oracle would currently attach for that scope. - SSE debug feed:
GET /guidance/stream?scope=...— live subscribers see every Curator commit (empty today; M9 starts writing them). - Subscriber registry:
GET /subscribers. - Read-only admin chat:
POST /chatwithaction: list_toolsoraction: invoke— LLM-less for M7, wired to the real LLM loop in M8.
Every endpoint gates on role == "admin" via a shared FastAPI dependency. Non-admin callers see 403. All mutation actions (seed, patch, delete) are audit-logged with the admin's username.
What it proved. Operators can inspect every observation Apollo has recorded, preview what guidance would currently attach, subscribe to the live debug feed, and exercise admin tooling via a chat surface — all before any autonomous Apollo behavior is enabled. The principle "admin must be able to see before Apollo is allowed to change" is now operational.
Why it matters. This is the gate that has to sit in front of every autonomous action that comes later. M9 (curator commits), M11 (admin chat mutations), M12 (autonomous curator) all route through inspection tools that landed here.
Milestone 8 — LLM synthesis + graph-anchor drift check
What shipped. Apollo's own LLM comes online. Four pieces:
- LLM client (
oracle/oracle/llm.py). Three providers, pluggable by env: openai(production default) — uses the existingopenaiSDK against any OpenAI-compatible endpoint (MiniMax hosted, Anthropic via a proxy, a local vLLM, etc.). No new dependency.minimax-local(scaffolded) — lazy HuggingFace transformers load of the stock MiniMax checkpoint using the canonical model-card signature. Weights resolve from the standard HF cache. For air-gapped clusters and operator-owned GPU inventory. Production-hardening knobs (APOLLO_LLM_LOCAL_MODEL_PATH, thread-pool offload, device mapping) are documented and deferred.stub— canned responses for deterministic tests.- Prompt templates (
oracle/oracle/learner/prompts.py). One builder per synthesis flavor: failure-pattern extraction (fires ontool_errorbursts), intent-pattern clustering (fires onuser_prompt/intent_schema), prompt-shim proposal (admin-initiated). Every template demands strict JSON output. - Synthesis dispatcher (
oracle/oracle/learner/synthesis.py). Event-driven. Fires from the ingest worker whenever a triggering event type lands. Trace-id coalescing: a burst of threetool_errorevents on the same trace collapses to one LLM call. Bounded concurrency:APOLLO_SYNTHESIS_MAX_CONCURRENT(default 4) semaphore. Also exposed viaPOST /api/v1/apollo/learnfor admin-initiated passes. - Graph-anchor drift check (
oracle/oracle/learner/drift.py). Four deterministic sub-checks every LLM proposal passes through before it can become an approved artifact: - Pattern-vs-edges — does the proposed FailurePattern reference an error edge actually present in the outcome graph?
- Intent-vs-clusters — does the proposed IntentPattern's class match an existing intent cluster?
- Weight swings — is the proposed weight within the z-score threshold of the existing weight distribution?
- Trajectory coherence — does the proposal's implied direction of change align with the EWMA trajectory?
Any failing check flags the proposal as a DriftEvent instead of approving it. No LLM involved in the check itself — purely math against graph state.
Approved proposals and drift events land on an in-memory pending list visible through GET /artifacts. M9 (Curator commits) will persist them to apollo_artifacts with versioning.
What it proved. A synthetic sequence of tool_error observations triggers one LLM call (coalesced across the burst); the LLM's JSON proposal passes through all four drift checks; consistent proposals become approved and show up on /artifacts immediately. Unsupported proposals get drift_flagged with per-check detail preserved so admins can see exactly why they were blocked.
Why it matters. This is the line where Apollo stops being a recorder and starts being an advisor. The graph-anchor principle is the critical piece: the graphs keep the LLM honest. Apollo can't hallucinate a failure pattern that no error edges support, and it can't invent an intent class no observations have clustered around.
Milestone 9 — Curator commits + versioning + audit
What shipped. Apollo can now mutate state. The Curator is the one subsystem empowered to persist artifact changes, and every mutation lands through the same atomic sequence: policy gate → history write → store update → audit write → SSE broadcast. Five actions: promote, demote, forget, edit, rollback. Five admin HTTP endpoints expose them; every endpoint gates on role == "admin".
Three new Elasticsearch indices back this: apollo_artifacts (current version of every artifact), apollo_artifact_history (every prior version, indefinite retention), apollo_audit (the Curator audit log). Retention is configurable via APOLLO_AUDIT_RETENTION_DAYS (default 90); the forget and rollback actions write indefinite: true records that are never purged.
The policy gate refuses six hard invariants per SPEC-14 §Curator → Disallowed actions: mutating auth / guardrails / token state, widening a caller's tool access, touching another user's conversation data, calling backend services on a user's behalf, or modifying / deleting audit records. Every action passes through the gate as its first step — if the gate raises, no downstream write happens.
Rationale is required non-empty. Pydantic validation at the model layer rejects blank rationales before any I/O, because audit review is the primary substrate for admin chat (§Rationale and evidence). A blank rationale defeats the whole surface.
Rollback provenance. When an admin rolls back v3 to v1's content, the Curator writes a new v4 whose content matches v1 but whose prev_version_id points at v3 (the pre-rollback current). The provenance chain stays linear — you can always trace "v4 came from rolling back v3 to v2's content" without special case handling in the audit reader.
SSE fan-out. Every successful Curator action broadcasts a curator_commit event to every admin watching the SSE debug feed (the channel that landed empty in M7). Admins see mutations as they happen; production ops can tail the stream during a synthesis burst to confirm nothing unexpected is being promoted.
Autonomous mode stays off. Per spec, M9 ships Curator in admin-triggered only mode — every mutation requires a human kicking it off through one of the admin endpoints. M12 flips the APOLLO_CURATOR_AUTONOMOUS switch that lets the Curator commit evolution-class proposals on its own.
What it proved. A full promote → edit → rollback → forget lifecycle works end-to-end. Every mutation is versioned, audited, policy-gated, and fan-out-broadcast. Failed actions (policy violations, nonexistent targets, reserved artifact-id namespaces) leave zero durable state behind. The four drift anchors from M8 continue to gate what can be promoted; drift_flagged proposals can't slip past the promote endpoint.
Why it matters. This is the transition from "Apollo watches and proposes" to "Apollo's proposals become persistent state." Before M9 the system could surface suggestions; after M9 admins can act on them with full version history and audit trails. The admin chat empowerment work in M11 and the autonomous Curator in M12 both mount on top of this — they are refinements to who can pull the trigger, not to what happens when the trigger is pulled.
Milestone 10 — Evaluator scoring + L3-performance amplification
What shipped. Apollo can now decay artifacts that stop working. Every active artifact carries a rolling EMA score; every observation on a trace that carried the artifact nudges the score; drops driven by L3-performance signals move the score faster than drops driven by user feedback or evaluator confidence. When an artifact's score stays below the 0.5 demote threshold long enough, the Evaluator writes a demotion recommendation; admins see it on GET /api/v1/apollo/recommendations and can act through the M9 mutation endpoints.
Five new modules under apollo/evaluator/:
signals.py— four failure-signal detectors, one per observation flavor:L3_ERROR—tool_errorenvelopes (magnitude 1.0, both agent- and library-observed).SCHEMA_MISMATCH—tool_outputwhose output dict is missing fields the L1 intent schema required. Library-dark: only fires when the observed service'scomponent_kind == "agent". Libraries have no agent-level intent contract; the check is skipped for them per SPEC-14 Q9.USER_FEEDBACK—user_feedbackenvelopes withsentiment in ("correction", "down", "abandoned"), at magnitudes 1.0 / 0.7 / 0.4 respectively.-
EVALUATOR_CONFIDENCE— graph-anchor confidence gaps continuous in [0, 1]. -
scoring.py— per-artifact rolling EMA with weight tiers per SPEC-14 §Evaluator:
| Signal | Default weight | Sustained asymptote |
|---|---|---|
| L3 error | 3.0 | 0.0 (demotable) |
| Schema mismatch | 3.0 | 0.0 (demotable) |
| User feedback | 1.5 | 0.4 (demotable) |
| Evaluator confidence | 0.5 | 0.8 (weakest; can't demote alone) |
Contribution is normalized so a single max-magnitude tick never drops the score below 0.7 — sustained signals are what drive demotion. Full per-signal decomposition (counts + magnitude totals) is preserved on every score so the Curator's audit records can explain exactly why a score moved.
cascade.py— three paths when an artifact's score updates:- Drift escalation (acute): ≥3 L3 signals within a 10-minute window →
DriftEventrecommendation. Admin review required. - L3-dominant fast-demote (N=2 cycles): when consecutive drops are L3-driven →
recommend_fast_demote. -
Normal demote (N=5 cycles below threshold): score stayed sub-0.5 long enough →
recommend_demote. Every non-none branch flags the artifact's upstreamIntentPattern/PromptShim/SpecFragmentfor re-synthesis on the next trigger. -
attribution.py— trace-id → applied-artifact-ids registry. Oracle's attacher records every attachment at dispatch time (via a new optionaltrace_idparameter onfor_l1/for_l3_agent); the evaluator queries by trace_id when signals arrive. Entries age out afterAPOLLO_GRAPH_TRACE_STATE_TTL_SEC. -
recommendations.py— pending-demotion queue with replace-semantics (the latest recommendation for an artifact overrides the prior one). Admins see the queue via a newGET /recommendationsendpoint; calling the M9 demote endpoint with the evaluator's score fields automatically clears the queue entry.
The demote audit record now carries evaluator_score, score_decomposition, and upstream_artifact_ids. The M9 schema already had these fields — M10 populates them when admins act on an evaluator recommendation. An admin auditing Apollo's Curator history can see for any demotion: what the rolling score was, how many signals of each kind had fired, and which upstream artifacts were flagged for re-synthesis.
Autonomous mode stays off. The Evaluator is an advisor at M10 — it writes recommendations, it doesn't demote on its own. M12 flips the autonomous switch.
What it proved. A synthetic run of 3 consecutive tool_error observations attributed to the same artifact produces an evaluator recommendation within 2–3 ticks. Signal weights are tunable without code changes (env-driven). Library-emitted observations correctly skip the schema-mismatch check. Bursts escalate to DriftEvent; slow drifts take the normal-demote path. Upstream refs are flagged on every recommendation so the next synthesis pass can re-examine the generators, not just the failing leaf.
Why it matters. The feedback loop closes here. Before M10 artifacts only moved forward. With the Evaluator in place, Apollo can tell an admin "pshim_xyz stopped working — 3 of the last 5 traces it guided failed; recommend demote" and that recommendation carries enough score decomposition for the admin to trust or reject it without re-deriving the math.
Milestone 11 — Admin chat empowerment
What shipped. Admins can now talk to Apollo in natural language. The admin-chat endpoint (POST /api/v1/apollo/chat with action: "chat") drives Apollo's LLM through a tool-use loop: the admin types a message, the LLM decides which tool to call, reads the result, decides whether to call another tool or compose the final prose answer. The loop is bounded by _MAX_CHAT_ITERATIONS (6) so a misbehaving LLM can't thrash.
Full tool catalog — 15 tools. Every action admins could previously take via the REST endpoints is now available as a chat tool, plus two purely conversational tools:
| Tool | Kind | Purpose |
|---|---|---|
list_memories / get_memory / list_decisions |
read | Inspect observations + audit records |
explain_decision |
read | "Why did you demote pshim_xyz?" — pulls audit + resolves evidence |
discuss_decision |
read | Pre-loads full context (audit + current artifact + upstream flags) for a focused thread |
promote_artifact / demote_artifact / rollback_artifact / forget_artifact / edit_artifact |
mutate | Curator actions via chat |
forget_memory |
mutate | Delete an observation |
rollback_graph |
mutate | Restore a prior graph snapshot |
trigger_synthesis |
mutate | Admin-initiated synthesis pass |
pause_curator / resume_curator |
mutate | Emergency freeze of all mutations |
Every chat-initiated mutation writes an audit record with actor: "admin:<username>" and trigger: "admin_chat" so the full audit log shows whether the action came from the REST endpoints, the chat LLM loop, or (eventually M12) the autonomous curator.
Curator pause/resume. A new process-wide flag in apollo/curator/pause.py. When flipped on, every Curator mutation raises CuratorPaused at the top of its call — no history write, no audit record, no state change. Pause and resume themselves write indefinite audit records so the full pause history outlasts retention. The admin-SSE debug feed fans out curator_paused / curator_resumed events on the flip so live operators see the state change immediately.
Rollback semantics for paused state: even resume_curator is gated on admin role (not on pause itself) — the pause can only be lifted by admin action, never by any autonomous path. This is the emergency-off-switch contract.
Conversational explanation flow. explain_decision and discuss_decision are the substrate for the admin's "why did you do this?" surface. Given an audit_id / artifact_id / trace_id, the helpers load the audit record, the artifact's current state, any upstream artifacts the evaluator flagged, and the observations on referenced traces — all bundled into a single tool_result the LLM folds into its prose answer.
What it proved. A real conversational flow works end-to-end: admin types "why did you demote pshim_xyz?" → LLM calls explain_decision(artifact_id="pshim_xyz") → reads the audit record → composes a response citing the concrete reason. Mutation tools work the same way. When the curator is paused, mutation tools surface the pause as a structured tool result so the LLM explains the freeze rather than retrying blindly.
Why it matters. This is the moment Apollo becomes operationally conversational. After M11, an admin can have a conversation — ask about recent curator activity, drill into why a specific artifact was demoted, roll back a bad decision, pause the whole curator during an investigation — all through one chat surface. The tool-use loop + policy gate + audit trail make this safe.
Milestone 12 — Autonomous Curator + drift prevention tuning
What shipped. Apollo's Curator can now commit without admin intervention. A periodic sweep (apollo/curator/auto.py, default 30s cadence) reads both the synthesis pending list and the evaluator recommendation queue, distinguishes evolution-class work (safe to auto-commit) from drift-class work (still needs admin review), and acts on the evolution-class cases with actor="curator_auto" and trigger="autonomous_curator".
The split:
| Source | Evolution-class → auto | Drift-class → admin-only |
|---|---|---|
| Synthesis pending list | status: "approved" proposals (drift-check passed) |
status: "drift_flagged" proposals |
| Evaluator recommendation queue | kind: "demote" / "fast_demote" |
kind: "drift_event" (acute bursts) |
Deterministic artifact ids. When the autonomous promoter commits a new proposal, it derives the artifact_id from the proposal body: fp_<hash(service+tool+signature)> for FailurePattern, ip_<hash(intent_class+prompt_shape)> for IntentPattern, etc. The same logical pattern converges on the same id across repeated synthesis passes — versions accumulate on one artifact instead of proliferating into many.
The same guards. Every auto-commit flows through the M9 Curator atomic sequence (policy gate → history → store → audit → SSE broadcast). The pause flag gates autonomous commits the same way it gates admin-triggered ones. Autonomous mode flips via APOLLO_CURATOR_AUTONOMOUS=true (default false so prior milestones' behavior is preserved on upgrade).
The safety seam. Drift-class work (the acute L3 bursts and the drift-check flags from M8) is deliberately retained for admin review. An autonomous Curator that also committed drift-class proposals would be fighting the drift-check's entire purpose.
What it proved. A synthesis pending list with 10 approved proposals becomes 10 committed artifacts in one sweep, each with actor: "curator_auto". Mixed queues auto-commit only the approved ones. Pausing the curator mid-sweep stops it cleanly. The same proposal body always resolves to the same artifact_id.
Why it matters. This is the inflection point where Apollo stops needing admin attention to get its basic job done. With M12, the admin's role shifts from approver of every mutation to reviewer of drift cases + auditor after the fact.
Milestone 13 — Maintenance + /stats polish + degraded-emitter reporting
What shipped. The ops-readiness milestone. Three pieces:
1. Hourly maintenance loop (apollo/maintenance.py). Runs on APOLLO_MAINTENANCE_INTERVAL (default 1h). On each pass:
- Scans apollo_observations, apollo_audit, and apollo_graph_snapshots and runs delete_by_query on every doc whose expires_ts < now().
- Indefinite audit records (forget, rollback, pause_curator, resume_curator, graph_rollback) carry expires_ts: null and are never touched.
- Metrics emitted on every run: apollo_maintenance_last_run_ts + apollo_maintenance_docs_deleted_total{index}.
- Snapshot coarsening (hourly → daily → weekly tiers per SPEC-14 Q5) is scaffolded; M13 ships the hook without the tier-generation logic (deferred until production accumulates enough data to warrant it).
- Snapshot coarsening (hourly → daily → weekly tiers per Q5) is scaffolded; M13 ships the hook without the tier-generation logic.
2. /stats surface expansion. Two new top-level keys:
- degraded_emitters — services whose apollo_ingest_last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300).
- intent_schema_coverage — rolling percentage of traces in the last 24 hours that carried an intent_schema observation. Null when there's no data in the window. Substrate for the eventual flip to APOLLO_REQUIRE_INTENT_SCHEMA=true in Phase 3.
3. Extended audit filters. GET /api/v1/apollo/audit supports two new query params:
- trigger — term filter on the mutation trigger (admin_endpoint, admin_chat, autonomous_curator, etc.).
- artifact_type — prefix filter on artifact_id (fp_, ip_, ps_, sf_).
What it proved. Expired observations and routine audit records purge cleanly; indefinite records survive the sweep. The /stats surface flags stale services without alerting on services that simply haven't been observed yet. The audit endpoint's filters compose (AND semantics).
Why it matters. An always-on system that grows state forever isn't operable. M13 is the janitor that lets Apollo run indefinitely. It's also the final piece for Phase 3 readiness — once intent_schema_coverage is high enough in production, the APOLLO_REQUIRE_INTENT_SCHEMA and APOLLO_REQUIRE_TRACEPARENT flags can flip to true without introducing data loss.
Milestone 14 — Subscriber LLM consumption (cortex L3)
The gap M14 closes. Through M0–M13, oracle attached apollo_guidance to every /chat response (L1) and every outbound MCP dispatch bound for an agent-kind L3 service. The wire was carrying guidance, but no subscriber actually read it. Cortex's MCP tool signatures didn't declare an apollo_guidance parameter, so FastMCP silently stripped the field before the handler ran. The brain was thinking; the body wasn't listening.
What shipped. M14 wires the consumption side into cortex (L3) per the contract locked in Q20:
- Cortex (L3): request-scoped
ApolloGuidanceCachepopulated at the top of_handle_tools_call—apollo_guidanceis popped from the inboundargumentsdict (so FastMCP no longer strips it) and the cache is exposed to tool implementations via aContextVar. Tools that internally run an LLM read the cache's accessors before composing their per-call system prompt and tool catalog. - SDK distribution. Cortex imports
ApolloGuidanceCachedirectly fromaxonis.core.oracle.guidance_cache— the canonical source per Q15. Cortex'spyproject.tomldeclaresaxonis-core>=0.1.0, mirroring oracle's pattern. (An earlier branch of M14 vendored the module locally to keep the agent lightweight, but cortex already importedaxonis.core.llm.LLMSpecfor its narrative tool, so the vendoring decision didn't pay for itself; the import was unified on the canonical path. Q15 still allows vendoring for future subscribers whose dep posture differs from cortex's.) - Failure posture (Q20):
cache.update(None)is a legal no-op; missing/malformed guidance never blocks the tool. Without guidance, accessors return empty lists /Noneand the tool's prompt builder behaves exactly as it did pre-M14.
Why beacon (L1) is deferred. Beacon currently has no HTTP connection to oracle — its MCP_SERVER_URL defaults to http://localhost:8000/mcp (cortex direct), so attached apollo_guidance has no path into beacon's process today. Once the beacon↔oracle connection lands, beacon's wiring follows the same SDK pattern as cortex.
Why parallax is deferred. Parallax's wiring follows cortex's pattern verbatim (MCP handler argument pop + request-scoped cache contextvar), but is deferred until parallax's own Phase 1 onboarding lands.
What it proved. Integration tests now assert the system prompt sent to a downstream LLM call inside a cortex tool observably grows when guidance is present and is unchanged when absent — the assertion the M0–M13 tests stopped short of.
Why it matters. Without M14, every artifact Apollo synthesized through M8–M12 was attached to envelopes and discarded by the recipients. M14 is the difference between a system that records and reasons and a system that actually steers behavior on the L3 side.
The journey, summarized
Apollo started as a design spec in M0 and ended as a live observation/learning/guidance system whose advice is read by L1 and L3 LLMs at runtime in M14. Fourteen milestones, each a reversible slice that left oracle functional whether or not the milestone shipped. No rewrites. No regressions.
| Phase | Milestones | What it delivered |
|---|---|---|
| Phase 1 — Observe + ground | M0–M6 | Ingest, graphs, guidance attach, subscriber SDK, oracle-sole-observer, trace propagation |
| Phase 2 — Synthesize + advise | M7–M10, M14 | Admin inspection, LLM synthesis + drift check, Curator commits, Evaluator scoring; M14 retroactively closes Phase 2's Injection Channel commitment on the L3 side by wiring cortex consumption (the L1 side is deferred) |
| Phase 3 — Empower + maintain | M11–M13 | Admin chat empowerment, autonomous Curator, hourly maintenance |
Cumulative capability matrix (as of M14 — final)
| Capability | Status |
|---|---|
Users can call /chat; responses carry apollo_guidance field |
✅ M3 |
Every /chat turn produces a full observation lineage under one trace_id |
✅ M5 + M6 |
Cortex (L3) consumes attached guidance via ApolloGuidanceCache |
✅ M14 (SDK M4; oracle attach M5; consumption wiring M14) |
Beacon (L1) consumes attached guidance via ApolloGuidanceCache |
⏸ Deferred — gated on a beacon↔oracle connection (no path today) |
| L3 libraries correctly skip guidance injection | ✅ M3 |
Observations land in apollo_observations with dedup |
✅ M1 |
| Deterministic graph updates on every observation | ✅ M2 |
| Hourly graph snapshots | ✅ M2 |
| Admin can inspect observations, graph state, and stats | ✅ M7 |
| Admin SSE debug feed is live (Curator commits fan out per M9; pause/resume per M11; autonomous commits per M12) | ✅ M7 (channel) + M9 (commits) |
| Apollo's LLM fires on triggering events, proposes artifacts | ✅ M8 |
| Proposals gated by four-check graph-anchor drift | ✅ M8 |
Admin can trigger synthesis manually via POST /learn |
✅ M8 |
| Trace propagation through L1 → Oracle → L3 | ✅ M6 |
| Non-admin callers blocked from every admin endpoint | ✅ M7 |
APOLLO_EMITTER_ENABLED=false kills emission cleanly |
✅ M5 |
| Local MiniMax via HuggingFace scaffolded | ✅ M8 |
| Admin can promote approved proposals into active artifacts | ✅ M9 |
Every Curator mutation is versioned (apollo_artifact_history) and audited (apollo_audit) |
✅ M9 |
| Curator policy gate blocks six hard invariants | ✅ M9 |
| Admin can edit / demote / forget / rollback artifacts | ✅ M9 |
forget and rollback write indefinite: true audit records |
✅ M9 |
| Curator commits fan out to admin SSE debug feed | ✅ M9 |
| Per-artifact rolling score driven by four failure signals | ✅ M10 |
| L3 signals carry amplified weight; fast-demote after N=2 L3-dominant cycles | ✅ M10 |
Acute L3 bursts escalate to DriftEvent rather than silent demotion |
✅ M10 |
| Upstream artifacts flagged for re-synthesis on every recommendation | ✅ M10 |
Admins see the recommendation queue on GET /recommendations |
✅ M10 |
Demote audit records carry evaluator_score + full per-signal decomposition |
✅ M10 |
| Admin-chat LLM loop — natural language drives tool selection | ✅ M11 |
explain_decision / discuss_decision surface audit + evidence conversationally |
✅ M11 |
| Chat tools cover every mutation | ✅ M11 |
pause_curator / resume_curator — emergency off-switch with indefinite audit |
✅ M11 |
Chat-initiated mutations audited with actor: "admin:<username>" + trigger: "admin_chat" |
✅ M11 |
| Autonomous Curator auto-commits evolution-class proposals + recommendations | ✅ M12 |
| Drift-class work retained for admin review | ✅ M12 |
| Deterministic artifact_id derivation | ✅ M12 |
Auto-commits gated by APOLLO_CURATOR_AUTONOMOUS flag + pause state |
✅ M12 |
Hourly maintenance job — expired-doc purge via delete_by_query |
✅ M13 |
| Indefinite audit records never purged | ✅ M13 |
/stats surfaces degraded_emitters + intent_schema_coverage |
✅ M13 |
GET /audit supports trigger + artifact_type filters |
✅ M13 |
Test totals (live counts; refresh on each milestone boundary): see the captured runs under oracle/docs/proof/consumption/l3-cortex-in-process/. At time of M14 landing: 416 oracle + 18 cortex M14 + 16 oracle graph + 138 oracle learning-loop tests passing; cortex full suite at 3026 / 0 / 307 (excluding two pre-existing orphan files). Specific repo counts shift as suites grow.
What a live /chat request looks like today
Note on the L1 caller. "L1 caller" means whatever POSTs to oracle's
POST /api/v1/chat—curl, an integration test, or a direct API client today; beacon once a beacon↔oracle connection lands./api/v1/chatis oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop. It is distinct from/api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for admin chat with Apollo.
Tracing one request from any L1 caller to response:
- L1 → Oracle. The L1 caller sends
POST /chatwith a user message and atraceparentheader. TraceparentMiddleware(M6) parses the header, installs thetrace_idon aContextVarfor the lifetime of the request.- Handler.
/chatextracts the prompt, callsemit_user_prompt(...)which drops an envelope on Apollo's in-process queue. Anllm_turnobservation fires on each LLM cycle inside the tool-use loop. - Tool call. LLM decides to call a cortex tool. Oracle looks up the tool; cortex is
component_kind="agent", soattacher.for_l3_agent(...)is called to produce the currentapollo_guidancepayload (M3). The payload is injected into the MCPargumentsalongsidellm_spec.traceparentis forwarded on the outbound HTTP. - L3. Cortex's MCP handler pops
apollo_guidancefromarguments, populates a request-scopedApolloGuidanceCachevia the contextvar (M14), runs the tool, and responds over MCP. - Oracle observes. The MCP response returns. Oracle emits
tool_output(ortool_error) in-process (M5) under the sametrace_id. - LLM loop continues until the model returns text.
emit_final_responsefires. The response body is assembled withapollo_guidanceattached (M3). L1 receives it.- In the background. Apollo's ingest workers persist each observation. Each passes through the deterministic extractors and updates the five Decision Graphs (M2). For qualifying event types, the synthesis engine schedules a coalesced LLM pass (M8); any proposal goes through the four-check drift gate and lands on the pending list. M12's autonomous-curator sweep promotes evolution-class proposals on its own; drift-class work waits for admin review.
Everything that happens between button-click and response is on the hot path. Everything that updates graphs, triggers synthesis, or writes audit records happens on background tasks that can't stall the request.
Roadmap (future work beyond M14)
Follow-up work tracked separately (full operational detail in §Future Improvements):
- Beacon ↔ Oracle connection. M14 deferred the L1 subscriber wiring because beacon's
MCP_SERVER_URLdefaults to cortex direct. Tracked in §Future Improvements §2.3. Once the connection design lands, beacon's L1 consumption follows the cortex SDK pattern. - Parallax onboarding. Same wiring pattern as cortex's M14 when parallax's Phase 1 work lands.
- Additional L3 emitter onboarding. UDS, athena, testament, titan, rest/fedai-rest — each onboards via either in-process relay or direct
ApolloClientPOST. No Apollo code change required. - Production-grade
minimax-localLLM provider. The scaffold landed in M8; deferred work:APOLLO_LLM_LOCAL_MODEL_PATH, thread-pool offload, device/quantization knobs, pre-pull readiness gate, streaming tokens. - Required-mode flips (
APOLLO_REQUIRE_INTENT_SCHEMA,APOLLO_REQUIRE_TRACEPARENT). Once M13's coverage stat is steadily high, operators can flip these totrue. - Snapshot tier generation. M13 shipped the coarsening hook; the actual hourly→daily→weekly snapshot generation is deferred until production data volume warrants it.
- Federation of artifacts. axonis-core's UDS pattern supports federation; Apollo's Curator will use it to share high-confidence artifacts across federated deployments in a later phase.
Post-M15: Prioritization Layers
After M14 closed the consumption loop and M15 wired oracle's own chat LLM as the L2 subscriber, the next pressure surfaced empirically: with the active artifact set growing, the attacher had no way to prefer the better artifacts. A multi-artifact stress test showed three problems on a single attach: silent drops at the cap, recency-eviction beating real quality, and zero observability for what was held back.
The response was a seven-layer rebuild of the selection path:
- Capped lineage — every dropped artifact gets a
kind: "capped"row; admins can ask "this shim matched 47 times but never sent." - Smarter sort key — five-tier priority (evaluator quality → synthesis confidence → applicability specificity → weight → recency) replaces the old "default weight 1.0 → recency wins."
- Signal preservation at promote — a contract test pins that
evaluator_score,confidence, andweightsurvive the metadata strip. - Real signals flowing — 4-A wires the evaluator to write scores back to the artifact; 4-B teaches synthesis to emit a confidence per proposal.
- Deepened rationale —
rationale_summarynow names artifact IDs (attached + capped), and a new per-artifact aggregation query answers "how often is this artifact being shadowed?" - Similarity — embeddings at promote (6-A) drive a promote-time advisory (6-B) and a periodic curator-time merger sweep (6-C).
Each layer is independently disabled by an env flag and degrades cleanly when its prerequisites aren't present. The full contract is in SPEC-PLATFORM-14-APOLLO.md §Prioritization Layers; the historical changelog of what shipped is in docs/APOLLO-FUTURE-IMPROVEMENTS.md §12.
The remaining work is the cap-defaults empirical study — once production telemetry has accumulated, revisit whether the per-type caps and similarity thresholds need tuning. That's a data-collection task, not code.
End-to-End Scenario
A presentation-ready walkthrough of a live request flowing L1 caller → Oracle/Apollo → Cortex → Oracle/Apollo → L1 caller. This section accompanies the automated integration test at oracle/apollo/tests/test_integration_beacon_to_l3.py and explains what the test proves about the real production flow.
Note on the L1 caller. The hop diagram below names "beacon" as the L1 caller, but beacon does not currently call oracle. The L1 hop today is exercised only by
curlor other direct callers againstPOST /api/v1/chat— oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop.POST /api/v1/apollo/chatis a separate admin-scoped surface that runs Apollo's independent MiniMax LLM and is not the L1 path. The flow below describes the/api/v1/chatpath; beacon onboards once a beacon↔oracle connection is wired.
Audience. Engineers demoing the system, reviewers verifying the architecture holds, and anyone who wants to see what Apollo actually does on one request.
The hop sequence
beacon oracle + apollo cortex
│ │ │
│ POST /api/v1/chat │ │
│ Authorization: Bearer … │ │
│ traceparent: 00-<tid>-… │ │
│ body: {message, convo} │ │
├───────────────────────────▶│ │
│ │ │
│ (1) TraceparentMiddleware │
│ validates / mints, sets ContextVar │
│ │ │
│ (2) /chat handler: │
│ oracle.hooks.chat.emit_user_prompt(…) │
│ → apollo_observations [L1-origin] │
│ │ │
│ (3) ToolExecutor.run(…) — LLM tool-use loop │
│ • llm.router.complete(…) │
│ • oracle.hooks.chat.emit_llm_turn(…) │
│ → apollo_observations [L2-origin] │
│ │ │
│ (4) _call_backend_tool: │
│ • attacher.for_l3_agent(…) → apollo_guidance │
│ • tool_args["apollo_guidance"] = {…} │
│ • outbound httpx POST to cortex /agentspace/mcp │
│ + traceparent header forwarded │
│ ├─────────────────────────────▶│
│ │ │
│ │ cortex extracts │
│ │ apollo_guidance, │
│ │ feeds its local │
│ │ ApolloGuidanceCache, │
│ │ runs the tool, │
│ │ responds │
│ │◀─────────────────────────────┤
│ (5) After dispatch returns: │
│ oracle.hooks.chat.emit_tool_output(…) │
│ → apollo_observations [L3-origin, │
│ emitted by oracle on cortex's behalf] │
│ │ │
│ (6) oracle.hooks.chat.emit_final_response(…) │
│ → apollo_observations [L2-origin] │
│ │ │
│ (7) attacher.for_l1(…) → apollo_guidance block │
│ │ │
│ ◀──────────────────────────┤ │
│ 200 OK │ │
│ { response, tool_calls, │ │
│ apollo_guidance: {…} } │ │
│ │ │
│ beacon's local │ │
│ ApolloGuidanceCache │ │
│ .update(apollo_guidance) │ │
Everything inside the oracle/apollo box runs in one process on one event loop. The only network hops are L1 → Oracle and Oracle → L3 (and the return legs). Apollo itself is never on the wire.
The integration test — what each scenario proves
oracle/tests/test_integration_beacon_to_l3.py exercises this flow end-to-end in-process. A fake cortex ASGI app stands in for the real service; httpx is routed through it via httpx.ASGITransport. All of oracle's real code runs — auth dependency (stubbed payload), TraceparentMiddleware, /chat handler, ToolExecutor loop, Apollo emission helpers, attacher, guidance cache SDK. No mocks on the Apollo surface itself.
Scenario 1 — happy path (TestHappyPath)
Beacon sends a well-formed request; cortex answers successfully. The test asserts:
200 OKwithresponsetext andtool_callsarray populated.- Cortex received
apollo_guidancein its MCPargumentsdict. Oracle's M3 attacher ran at dispatch time; the payload has theAttachedGuidanceshape (as_of+artifacts+rationale_summary). - Cortex received
traceparentin HTTP headers. M6 propagation intact. - Response body carries
apollo_guidance— beacon's localApolloGuidanceCache.update(body.apollo_guidance)succeeds. - Observation lineage stitches under one
trace_id—user_prompt,llm_turn,tool_output,final_responseall land inapollo_observationswith the same W3C trace-id the caller sent. tool_outputenvelope'sargumentshasapollo_guidancestripped. Apollo doesn't observe its own injection as part of the caller's intent — the strip happens inemit_tool_outputbefore enqueueing.
Scenario 2 — L3 failure (TestL3FailurePath)
Same request shape, but the fake cortex returns an error. The test asserts:
200 OKfrom oracle (the request itself didn't fail; the tool did).- Observations include
tool_errorinstead oftool_output— oracle's M5 detect-and-emit logic correctly identified the JSON error envelope and routed to the error helper. - Trace stitching still holds on the failure path — same
trace_idacross the error observation.
Scenario 3 — missing traceparent (TestMissingTraceparent)
Beacon forgets to send the header. The test asserts:
200 OK(best-effort mode).TraceparentMiddlewareminted a fresh 32-hex trace-id.- Every observation in the run carries the same minted trace-id — lineage stitches even when the caller didn't mint one.
(The apollo_missing_traceparent_total counter increments; the test doesn't assert on counters specifically, but the production flow exercises that path.)
Scenario 4 — subscriber SDK shape contract (TestCortexGuidanceConsumption)
The test takes the apollo_guidance payload cortex received and feeds it into axonis.core.oracle.guidance_cache.ApolloGuidanceCache.update(…) directly. Asserts the payload has the expected {as_of, artifacts, rationale_summary} shape and the SDK accepts it without error — proving the on-wire contract matches the consumer SDK even in the empty-artifact-set case (no real artifacts promoted).
What the test doesn't cover (scoped differences from live)
The integration test is authoritative for oracle's request path + Apollo integration under real code, but it does NOT exercise:
| Not covered | Why | How to cover |
|---|---|---|
| Real JWT / Keycloak signature verification | Requires live SSO | Live deployment smoke test |
| Real Elasticsearch write/read | Adds flakiness + CI cost | Live deployment with seeded index templates |
| Real Redis conversation persistence | Same | Live deployment |
| Real LLM output quality | server.llm.router.complete is stubbed |
Deploy with an LLM key and run through beacon's chat UI |
| Real network latency / TLS / cross-host traceparent | All in-process | Multi-host staging scenario |
| Cortex's real tool implementations | Fake cortex only returns canned responses | Run real cortex with domain packs loaded |
| Long-running ingest worker behavior | The test bypasses the queue (writes directly) | Live ES-backed deployment |
For a live demo, the companion script below mirrors the test's assertions against real running services.
Live scenario script (running services)
Assumes:
- Oracle running at
localhost:8001(perdevelopers-environment/oracle/oracle.env). - Cortex running at
localhost:8000(or whereverORACLE_SERVICESpoints at registration). - Parallax optional but recommended for variety.
- Elasticsearch + Redis up and reachable from oracle.
- A valid Keycloak user token exported as
$USER_TOKEN, and an admin token as$ADMIN_TOKEN. - Every
APOLLO_*variable comes fromdevelopers-environment/conf/development.axonis.ai.env— the canonical home for Apollo config, shared across oracle, parallax, cortex, and beacon. The env file mirrors every variable Apollo'ssettings.pyreads, grouped by subsystem (LLM, ingest, graphs, evaluator, curator, audit, maintenance, trace propagation, drift detection). For a live-LLM demo, overrideAPOLLO_LLM_PROVIDER=openai+APOLLO_LLM_BASE_URL+APOLLO_LLM_API_KEY(seeSPEC-PLATFORM-14-APOLLO.md§Apollo's LLM). For a stub-LLM plumbing-only run, overrideAPOLLO_LLM_PROVIDER=stub.
Step 1 — confirm services are up
curl -sf http://localhost:8001/health | jq
curl -sf http://localhost:8000/health | jq # cortex
curl -sf http://localhost:8001/service-info | jq # oracle's own info
curl -sf -H "Authorization: Bearer $ADMIN_TOKEN" \
http://localhost:8001/api/v1/apollo/stats | jq '.milestone, .metrics | keys[]' | head -20
Expected: oracle returns "status": "ok", cortex returns healthy, /stats returns the current milestone (M14 at time of writing) plus every Apollo counter at zero.
Step 2 — send the beacon request
TRACE_ID=$(openssl rand -hex 16)
SPAN_ID=$(openssl rand -hex 8)
TP="00-${TRACE_ID}-${SPAN_ID}-01"
curl -sS -X POST http://localhost:8001/api/v1/chat \
-H "Authorization: Bearer $USER_TOKEN" \
-H "traceparent: $TP" \
-H "Content-Type: application/json" \
-d '{
"message": "find recent activity for customer cust_42",
"conversation_id": "demo_1"
}' | jq '. | {response, tool_calls, apollo_guidance}'
Expected: a JSON response with response, tool_calls (non-empty when the tool-use loop fired), and apollo_guidance populated with {as_of, artifacts, rationale_summary}. Pre-promote, artifacts is [].
Step 3 — tail the admin SSE feed (in another terminal)
curl -N -H "Authorization: Bearer $ADMIN_TOKEN" \
"http://localhost:8001/api/v1/apollo/guidance/stream?scope=*"
Nothing prints yet (no Curator commits). Leave this open for Step 6.
Step 4 — verify the lineage landed
# Every observation oracle emitted for this request.
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
"http://localhost:8001/api/v1/apollo/memories?trace_id=${TRACE_ID}&limit=50" \
| jq '[.observations[] | {event_type, service, payload: (.payload | keys)}]'
Expected: array includes user_prompt, one-or-more llm_turn, tool_output (or tool_error), final_response — all with the same trace_id. tool_output.service is the L3 service oracle dispatched to (e.g., "cortex").
Step 5 — check synthesis triggered + drift-checked a proposal
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
http://localhost:8001/api/v1/apollo/artifacts | jq '.count'
Expected: at least one pending proposal (if the message hit a failure-pattern or intent-pattern synthesis path). Each proposal has status: "approved" or status: "drift_flagged" with per-check drift detail.
Step 6 — admin promotes the proposal (M9 + M11)
Natural-language path:
curl -sS -X POST http://localhost:8001/api/v1/apollo/chat \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"action": "chat",
"message": "promote the approved FailurePattern proposal you just made"
}' | jq '. | {response, tool_calls}'
Oracle's admin-chat LLM picks promote_artifact, writes the audit record, and narrates the result. In the other terminal, the SSE feed prints:
data: {"event": "curator_commit", "action": "promote", "artifact_id": "fp_<hash>", "actor": "admin:<you>", "ts": "..."}
Direct REST path:
PROP_ID=$(curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
http://localhost:8001/api/v1/apollo/artifacts | jq -r '.pending[0].id')
curl -sS -X POST http://localhost:8001/api/v1/apollo/artifacts/fp_live_demo/promote \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"proposal_id\": \"${PROP_ID}\", \"rationale\": \"demo promote\"}" | jq
Step 7 — re-run the same /chat to see guidance flow through
Re-send Step 2's request. This time:
apollo_guidance.artifactscontains the promoted FailurePattern.- Cortex's inbound MCP dispatch carries the artifact in
arguments.apollo_guidance.artifacts. - Cortex's MCP handler (M14) pops the field, populates a request-scoped
ApolloGuidanceCachevia the contextvar; the cortex tool's next LLM call foldsget_active_failure_patterns(...)into its system prompt. Captured proof of this end-to-end:oracle/docs/archive/M14-CORTEX-CONSUMPTION-PROOF.md+oracle/docs/proof/.
Step 8 — audit the trail
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
"http://localhost:8001/api/v1/apollo/audit?artifact_id=fp_live_demo" \
| jq '[.records[] | {action, actor, trigger, rationale, indefinite}]'
Expected: one promote record with actor: "admin:<you>", trigger: "admin_chat" (or admin_endpoint), non-empty rationale.
What a presenter walks through
Suggested demo flow (15 minutes):
- Open Step 1 and Step 2 — show oracle healthy, send one /chat, surface the response with the
apollo_guidancefield. - Open Step 4 — show
/memories?trace_id=...returning the full lineage under one trace_id. This is Apollo observing. - Open Step 5 — show
/artifactsreturning one or more pending proposals. This is Apollo synthesizing. - Open Step 6 with the chat path — admin says "promote it"; the LLM calls
promote_artifact; the SSE feed printscurator_commitlive. This is Apollo advising + admin acting. - Open Step 7 — re-run the /chat; apollo_guidance now has the promoted artifact; cortex sees it in its tool arguments. This is Apollo's learning reaching L3 via oracle.
- Open Step 8 — audit record shows actor, rationale, and indefinite flag semantics. This is Apollo's trail.
Optional: run the in-process integration test as a "here's how we prove this in CI" companion:
. .venv/bin/activate
pytest oracle/tests/test_integration_beacon_to_l3.py -v -s
4 scenarios, all deterministic, all green.
Cross-references
Cross-references
- Contract: §Ingest Semantics, §Injection Channel, §Trace Propagation
- Build order: §Implementation Plan M5 (oracle-sole-observer), M6 (traceparent), M7 (admin surface), M9 (curator)
- Narrative: §Design Journey "What a live /chat request looks like today"
- Runtime call graph: §Technical Overview "The /chat call graph"
File & Function Inventory
Map of every file that exists for Apollo (Apollo's own apollo/ tree plus Apollo-specific files in oracle's server tree). Each file is listed under its package; each entry has a one-line file description, then a bullet list of its public functions/classes with a 1-2 sentence description per item.
apollo/ package root
apollo/__init__.py — Package marker; module docstring states Apollo is the observation/learning/guidance layer mounted into oracle at /api/v1/apollo/*. No re-exports.
apollo/app.py — Builds Apollo's FastAPI sub-app and owns the five background loops (snapshot, autonomous curator, maintenance, synthesis sweep, coalescer) that oracle's Starlette lifespan starts and stops.
- startup() — async; called from oracle's lifespan. Bootstraps indices, wires the active-artifact source, prewarms the active-set cache, starts the ingest queue/workers, and spawns the five periodic tasks (including the Layer 6-C coalescer loop).
- shutdown() — async; signals each loop's stop event (snapshot, curator_auto, maintenance, synthesis_sweep, coalescer), awaits with a bounded timeout, drains the admin-SSE hub, then stops the ingest workers.
apollo/artifacts.py — Pydantic schemas for every artifact type listed in §Memory Model → Artifact types, plus the AttachedGuidance envelope that rides on outbound responses/dispatches. Defines which types are attachable vs admin-only.
- ArtifactType(Enum) — full set of typed artifacts Apollo may produce; adding a new type requires extending this enum and adding a *Content class.
- ATTACHABLE_TYPES — frozenset of the six artifact types the SDK's canonical accessors consume; others are admin/audit-only.
- Applicability(BaseModel) — scope filter (intent_class, layer, service_name, tool_name, tags) the selector evaluates per artifact.
- PromptShimContent, SpecFragmentContent, ToolPairingHintContent, FailurePatternContent, ServiceConnectionHintContent, IntentPatternContent — content models for the six attachable types.
- DriftEventContent, DecisionTrajectoryContent, IntentSchemaContent, SchemaDriftContent, PromptShapeContent, CapabilityMapContent — admin/audit-only content models declared for forward compatibility.
- validate_content(artifact_type, content) — coerce a raw content dict against its type's schema; the Curator calls this at commit time.
- ApolloArtifact(BaseModel) — outer envelope per artifact (id, type, version, applicability, content, rationale, as_of).
- AttachedGuidance(BaseModel) — the apollo_guidance wire payload (as_of, artifacts, rationale_summary); intentionally slim per §Injection Channel → Payload shape.
apollo/llm.py — Apollo's own LLM client (M8). Pluggable via APOLLO_LLM_PROVIDER across openai/minimax/anthropic/minimax-local/stub providers; supports both blocking complete() and token-streaming stream().
- ToolCall — dataclass; normalized provider tool-call (id, name, arguments dict).
- StreamChunk — dataclass; one delta from stream(), carrying either content_delta or terminal final LLMResponse.
- LLMResponse — dataclass; flat provider-agnostic response with as_json() tolerant parser.
- LLMClient — front-door singleton with get(), reset_singleton(), install_stub_response(), install_stub_stream(), and provider-dispatching complete() / stream() calls.
apollo/maintenance.py — Hourly background loop (M13) that purges expired docs via delete_by_query, coarsens hourly→daily→weekly snapshots, reconciles orphaned artifact-history rows, and emits maintenance metrics. Also exposes read-side helpers for /stats.
- run_periodic(stop_event) — async loop; sleeps in short bounded waits so shutdown is prompt.
- run_once(now=...) — one maintenance pass; injectable now for tests, returns a summary dict.
- degraded_emitters() — scan per-service last-ingest timestamps and flag any stale beyond APOLLO_INGEST_STALE_WARN_SEC.
- intent_schema_coverage(window_hours) — percentage of recent traces carrying an intent_schema observation; None when no data in window.
apollo/metrics.py — Prometheus counter/gauge declarations for every Apollo metric named in the spec. Registered at import time so dashboards can be wired before drivers exist.
- Module-level metrics for ingest (INGEST_ACCEPTED_TOTAL, INGEST_QUEUE_DEPTH, etc.), guidance attach, traceparent propagation, synthesis sweep, maintenance loop, and curator mutations/policy violations/atomic failures/orphan detection.
- Guidance-attach counters/histograms: GUIDANCE_ATTACH_NULL_TOTAL (scope, reason), GUIDANCE_ATTACH_SUCCESS_TOTAL (scope), GUIDANCE_ATTACH_PAYLOAD_BYTES (scope), GUIDANCE_ATTACH_ARTIFACT_COUNT (scope), GUIDANCE_ATTACH_CAPPED_TOTAL (scope, artifact_type).
- Evaluator score-persist counters (Layer 4-A): EVALUATOR_SCORE_PERSISTED_TOTAL, EVALUATOR_SCORE_PERSIST_FAILED_TOTAL.
- Coalescer counters (Layer 6-C): COALESCER_PROPOSALS_EMITTED_TOTAL, COALESCER_MERGE_FAILED_TOTAL.
- snapshot() — read every counter/gauge into a dict shape used by GET /stats; skips synthetic _created samples.
apollo/admin/
apollo/admin/__init__.py — Package marker (M7); re-exports admin_router and SSEHub.
apollo/admin/api.py — Admin-only REST endpoints (M7→M13). Every route gated on atlasfl-admin via require_admin; covers memory CRUD, lineage, artifacts, audit, curator mutations, divergence audit, provenance, guidance preview, subscriber/SSE inspection.
- require_admin(request) — FastAPI dependency; raises 403 unless token payload carries atlasfl-admin. Returns the payload for attribution.
- SeedObservationRequest, MemoryPatchRequest, PromoteRequest, DemoteRequest, ForgetRequest, EditRequest, RollbackRequest, LearnRequest — Pydantic bodies for the corresponding mutation endpoints. PromoteRequest carries supersede: bool = False for the description-override / coalescer conflict path.
- list_memories(...), get_memory(uid), seed_memory(body), patch_memory(uid, body), forget_memory(uid) — memory CRUD; seeds flow through normal ingest.
- get_lineage(...) — cross-trace lineage merging live AttributionRegistry with persisted apollo_lineage_events; entries tagged live | persisted | live+persisted.
- get_capped_lineage(artifact_id, service=None, limit=500) — GET /lineage/capped: traces where the per-type attach cap held the artifact back (Layer 1 visibility).
- get_artifact_stats(artifact_id, since=None) — GET /artifacts/{artifact_id}/stats: per-artifact attached / capped aggregate from apollo_lineage_events.
- list_artifacts() — combined active artifacts (M9 persisted) and pending synthesis proposals (M8 in-memory).
- promote_artifact(artifact_id, body), demote_artifact(...), forget_artifact(...), edit_artifact(...), rollback_artifact(...) — Curator mutation endpoints; translate CuratorPolicyViolation to 403, DescriptionOverrideConflict to 409, ValueError to 4xx.
- trigger_learn(body) — admin-initiated synthesis pass (POST /learn); 202-accepts and runs the LLM on a background task.
- list_recommendations() — pending Evaluator demotion recommendations (M10).
- list_audit(...) — Curator audit log with action/actor/artifact/trigger/type filters.
- get_provenance(artifact_id, ...) — trace an artifact back to audit chain, source proposal, and contributing observations (handles real-trace, sweep:*, and admin triggers).
- list_divergence(...) — observations where caller_identity.username differs from emitted_by.token_subject; for audit support of Invariant 17.
- preview_guidance(scope) — preview the L1 or L3:list_subscribers(), guidance_stream(scope) — admin-SSE inspection and live feed of Curator commits.
apollo/admin/sse.py — Process-wide SSE hub for fanning Curator commits to admin clients in real time. M7 ships the pipe empty (no Curator commits yet); synthetic broadcasts are enough to prove wiring.
- SSEHub — singleton-by-convention; subscribe(scope, username), unsubscribe(sub), broadcast(event, scope=None) (drops on slow consumer queue-full), subscribers_snapshot(), shutdown() (enqueues a sentinel), reset() (test helper).
- sse_event_stream(sub) — async generator that serializes queued events as text/event-stream bytes with retry: preamble and 15s keepalive comments.
apollo/chat/
apollo/chat/__init__.py — Package marker for Apollo's admin-chat surface (M7 read-only, M11 action tools). No re-exports.
apollo/chat/conversation.py — Redis-backed admin-chat history keyed by (username, conversation_id). Distinct from oracle's user conversation store; falls back to an in-memory dict when Redis is unreachable.
- AdminConversationStore — get(username, conversation_id), append(username, conversation_id, role, content) (trims to max turns), reset(username, conversation_id). Constructor accepts an injected client or uses cached health-checked one.
apollo/chat/explain.py — Three read-only helpers (M11) that surface audit records in a shape the admin-chat LLM consumes as tool_result.
- explain_decision(audit_id|artifact_id|trace_id, actor) — return matching audit record(s) plus best-effort evidence_ref resolution (observations by trace_id, target version, etc.).
- list_decisions(action, actor, since, limit, caller) — chronology of recent Curator actions; returns summary fields only.
- discuss_decision(audit_id|artifact_id, actor) — load full context bundle (audit + current artifact + upstream artifacts) for a focused multi-turn thread.
apollo/chat/server.py — Admin-chat REST endpoints (M11) + §6.2 SSE streaming. POST /chat dispatches on action (list_tools | invoke | chat); POST /chat/stream is the streaming variant. LLM drives tool selection via OpenAI-native tool-calling.
- ChatRequest, ChatStreamRequest — Pydantic bodies.
- admin_chat(body) — entry point; dispatches on action.
- admin_chat_stream(body) — streaming variant; forwards _chat_loop_events() output as SSE frames with disabled proxy buffering.
apollo/chat/tools.py — Admin-chat tool catalog (M11). Defines TOOL_DEFINITIONS metadata, TOOL_IMPLEMENTATIONS dispatch table, and the OpenAI-tools converter. Every tool takes an actor kwarg for audit attribution.
- TOOL_DEFINITIONS — list of tool dicts (name, description, parameters, mutating flag) for read tools (list_memories, get_memory, list_decisions, explain_decision, discuss_decision, lineage) and mutation tools (forget_memory, promote/demote/forget/edit/rollback_artifact, rollback_graph, trigger_synthesis, pause/resume_curator).
- to_openai_tools(definitions=None) — convert Apollo's TOOL_DEFINITIONS to OpenAI's tools=[...] JSON-Schema shape.
- list_memories, get_memory, list_decisions_tool, explain_decision_tool, discuss_decision_tool, lineage — read-tool implementations.
- forget_memory, promote_artifact, demote_artifact, rollback_artifact, forget_artifact, edit_artifact, rollback_graph, trigger_synthesis, pause_curator, resume_curator — mutation tool wrappers over oracle.curator actions.
- TOOL_IMPLEMENTATIONS — name → callable dispatch table consumed by chat/server.py.
apollo/curator/
apollo/curator/__init__.py — Package marker (M9); re-exports promote, demote, forget, edit, rollback, ApolloAuditRecord, write_audit, CuratorPolicyViolation, CuratorPaused, ActionKind, ActionRequest, allow_or_raise, is_paused, raise_if_paused, set_paused, clear_paused.
apollo/curator/actions.py — The five Curator mutation verbs (M9). Each is a policy gate → history write → store mutation → audit write → SSE broadcast atomic sequence; per-stage failures bump CURATOR_ATOMIC_FAILURES_TOTAL and unwind partial work.
- DescriptionOverrideConflict(Exception) — raised by promote() when another active artifact already overrides the same (service, tool) description; admin API maps to 409.
- ActionResult — dataclass returned by every action; carries similar_artifacts: list[dict] for the Layer 6-B promote advisory; to_dict() flattens for HTTP responses.
- promote(artifact_id, proposal, actor, rationale, trigger, evidence_ref, admin_note, supersede=False) — lift an approved proposal into the active set as version 1 (or N+1 if prior exists). When supersede=True, atomically demotes both description-override conflicts and any proposal.supersedes: [...] coalescer cluster members (Layer 6-C).
- demote(artifact_id, actor, rationale, ...) — hide from guidance without deleting; sets status="demoted" and bumps version. Drops any matching evaluator recommendation.
- forget(artifact_id, actor, rationale, ...) — delete current artifact; writes indefinite: true audit so the action is never purged.
- edit(artifact_id, actor, rationale, content_patch, applicability_patch, ...) — patch metadata or content; bumps version; requires at least one patch.
- rollback(artifact_id, target_version, actor, rationale, ...) — restore prior version's content as a new version with prev_version_id pointing at the pre-rollback record. Indefinite audit.
- _find_description_override_conflicts(*, artifact_id, applicability, content) — return active artifacts that would shadow this promote on the receiver's get_tool_description_overrides() path; gate only triggers when content.description_override and applicability.tool_name are both set.
- _embed_and_find_similar(*, artifact_id, artifact_type, content, applicability) — Layer 6: compute the new artifact's embedding and return (embedding, similar_artifacts); never raises.
- _load_active_set_for_similarity(artifact_type) — pull every status=active artifact of one type for the similarity scan.
apollo/curator/audit.py — Curator audit-log model and writer (M9). Schema per §Curator → Audit log; rationale is required non-empty.
Curator audit-log model and writer (M9). Schema per SPEC-14 §Audit log; rationale is required non-empty.
- ApolloAuditRecord(BaseModel) — fields per SPEC-14; field validators enforce non-empty rationale and actor. as_document() computes expires_ts (null when indefinite=True, else +APOLLO_AUDIT_RETENTION_DAYS).
- write_audit(record, store=None) — persist a record, emit the canonical oracle.curator.audit log line, return the record_id.
apollo/curator/auto.py — Autonomous Curator background driver (M12). Sweeps the synthesis pending list (evolution-class proposals) and the evaluator recommendation queue (demote/fast_demote) and commits them with actor="curator_auto". Drift-class items stay for admin review.
- derive_artifact_id(proposal) — deterministic typed-prefix-plus-SHA256 id (fp_*, ip_*, ps_*, sf_*, art_*) so repeated proposals for the same logical artifact converge on the same id.
- sweep_once() — one autonomous-commit pass; short-circuits with ran=False when disabled or paused.
apollo/curator/pause.py — Process-wide pause flag (M11) freezing every Curator mutation. State is non-persistent by design.
- PauseState — dataclass with snapshot().
- CuratorPaused(Exception) — raised by mutations when the flag is on; carries the pause state for 409 surfaces.
- is_paused(), snapshot(), raise_if_paused() — pause-state accessors.
- set_paused(actor, rationale, admin_note=None) — flip on, write indefinite audit, broadcast SSE; idempotent.
- clear_paused(actor, rationale, admin_note=None) — flip off, write indefinite audit, broadcast SSE; idempotent.
- reset() — test helper.
apollo/curator/policy.py — Hard-invariant policy gate (M9). Every Curator action calls allow_or_raise(...) first; violations raise CuratorPolicyViolation and increment CURATOR_POLICY_VIOLATIONS_TOTAL. Keys on action shape, not artifact content.
- CuratorPolicyViolation(Exception) — carries rule id and human-readable detail.
- ActionKind(Enum) — full set of action verbs (promote/demote/forget/edit/rollback/compact + M11 pause/resume/trigger_synthesis/graph_rollback).
- ActionRequest — dataclass normalized for the gate (kind, actor, artifact_id, optional patches and target_version).
- allow_or_raise(request) — sole public entry; runs every rule check and increments the violations counter on raise.
apollo/evaluator/
apollo/evaluator/__init__.py — Package marker (M10); re-exports AttributionRegistry, cascade, recommendations, scoring, and signals public surface. M10 ships the Evaluator as an advisor (writes recommendations; M12 flips autonomous commit).
apollo/evaluator/attribution.py — Trace → applied-artifact-ids registry. Oracle's attacher records every attachment here at dispatch time so the evaluator can attribute signals back to the right artifacts.
- AttributionRegistry — record(trace_id, scope, artifact_ids), applied_for(trace_id, service_name=None), traces_with_artifact(artifact_id, service_name=None), prune(now=None) (TTLs entries via APOLLO_GRAPH_TRACE_STATE_TTL_SEC), snapshot(), reset().
- get(), reset() — module-level default-instance accessors.
apollo/evaluator/cascade.py — Upstream flagging plus DriftEvent vs silent-demote decision logic. Pure function — translates score state into a recommendation outcome.
- CascadeOutcome — dataclass (artifact_id, action, reason, upstream_flagged).
- cascade_on_l3_dominant(engine, artifact_id, upstream_ids=None) — pick drift_event (≥3 L3 signals in window), recommend_fast_demote (N=2 L3-dominant ticks), recommend_demote (N=5 sub-threshold ticks), or none. Flags upstream on every non-none branch.
apollo/evaluator/persist.py — Layer 4-A — writes evaluator scores back to apollo_artifacts so the attach sort key can read them. Fire-and-forget via the event loop, kill-switched by APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED, and pytest-safe.
- persist_score_to_artifact(artifact_id, score, decomposition) — schedule an ES update via a Painless script so the existing type-specific content fields survive.
apollo/evaluator/recommendations.py — Per-artifact pending demotion recommendation queue surfaced on GET /recommendations. Replace-semantics — latest recommendation per artifact wins; admin acts via the M9 demote endpoint.
- Recommendation — dataclass (artifact_id, kind, reason, evaluator_score, score_decomposition, upstream_artifact_ids, created_at); to_dict().
- RecommendationQueue — add(rec), remove(artifact_id), get(artifact_id), snapshot(), reset(), __len__.
- get_queue(), reset_queue() — module-level singleton accessors.
apollo/evaluator/scoring.py — Per-artifact rolling EMA score with preserved per-signal decomposition. Weight tiers L3-error 3.0 / schema-mismatch 3.0 / user-feedback 1.5 / evaluator-confidence 0.5; score < 0.5 triggers demotion cadence.
- SignalKind(Enum) — L3_ERROR / SCHEMA_MISMATCH / USER_FEEDBACK / EVALUATOR_CONFIDENCE.
- ArtifactScore — dataclass holding the rolling score, signal counts/magnitudes, tick counters, L3-dominant counter, recent L3 timestamps; decomposition() returns audit-ready dict.
- ScoringEngine — score_for(artifact_id), snapshot(), apply_signal(artifact_id, signal_kind, magnitude, now=None) (pure EMA math), repeated_l3_failures_in_window(artifact_id, now=None), reset().
- get_engine(), reset_engine() — module-level singleton accessors.
apollo/evaluator/signals.py — Failure-signal detection: maps one observation to zero-or-more SignalHits per applied artifact. Schema-mismatch fires only when the observed service is component_kind == "agent".
- SignalHit — frozen dataclass (artifact_id, signal_kind, magnitude, source_trace_id, source_event_type).
- detect_signals(envelope, applied_artifact_ids, intent_schema_for_trace=None, confidence_gap=None) — dispatch on event_type and produce signal hits; returns [] when no artifacts attributed.
apollo/guidance/
apollo/guidance/__init__.py — Package marker; module docstring describes the two entry points (attacher in-process helpers and api admin REST inspection). No re-exports.
apollo/guidance/api.py — Apollo's public REST entry points: POST /observations (L3 ingest) and GET /stats (metric snapshot).
- post_observations(request) — accept a validated ObservationBatch; stamps caller_identity from the Bearer token when emitter didn't, always overwrites emitted_by server-side, returns 202 with accepted/dropped counts.
- get_stats() — JSON snapshot of every registered metric plus M13 additions (degraded_emitters, intent_schema_coverage) and a guidance_health block with per-scope success / null-by-reason breakdown and null-rate.
- _guidance_health(metrics) — derive the at-a-glance per-scope attach health block from the flat metrics snapshot.
apollo/guidance/attacher.py — In-process attach helpers oracle calls when composing outbound envelopes. Bounded by APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS; failures and timeouts return None so the request still succeeds without guidance.
- load_active_artifacts_from_es() — read every status=active artifact from apollo_artifacts and project to ApolloArtifact; cached for _ACTIVE_SET_TTL_SEC (5s); skipped under pytest.
- set_active_set_source(src), reset_active_set_source() — install/restore the active-set provider hook (M8 swaps in the Curator-backed source).
- for_l1(user, intent_class, caller_tags, trace_id) — build apollo_guidance for an L1 /chat response.
- for_l2(user, intent_class, caller_tags, trace_id) — build the L2 in-process payload that oracle's tool-executor folds into its prompt (M15).
- for_l3_agent(service_name, tool_name, intent_class, caller_tags, trace_id) — build the payload injected into an MCP dispatch's arguments.apollo_guidance.
- _cap_for_type(artifact_type) — return the configured per-attach cap for one artifact type, or None if uncapped.
- _safe_float(value, default) — coerce to float with a defensive fallback; lets _sort_key tolerate missing ranking signals.
- _sort_key(artifact) — five-tier priority chain consulted at cap time and read by the receiver: evaluator_score → confidence → applicability specificity → weight → as_of.
- _apply_attach_caps(matched, *, scope_label) — sort each type's matches by _sort_key and keep the top-N; returns (kept, dropped_pairs) where dropped_pairs is [(artifact_id, artifact_type), …] for held-back artifacts.
- _summarize(artifacts, capped_pairs=None) — emit a per-type summary string ("N type (id1,id2,…+M) +C capped (cid1,…)") that diffs cleanly across calls.
apollo/guidance/selectors.py — The artifact-applicability matcher. Filters an active set to those whose Applicability matches the caller context.
- match_artifacts(active_set, layer, intent_class, service_name, tool_name, caller_tags) — return artifacts whose type is in ATTACHABLE_TYPES and whose applicability fields match the caller; admin/audit-only types never leak.
apollo/hooks/
apollo/hooks/__init__.py — Package marker; oracle-side in-process emission hooks. Apollo is a package inside oracle (§Package Structure), so oracle emits via direct calls to these helpers rather than HTTP. No re-exports.
apollo/hooks/chat.py — Emit helpers oracle's REST and LLM paths call to feed Apollo. Every helper is fire-and-accept; failures log and never raise into oracle's request path.
- emit_intent_schema(intent_schema, conversation_id, token_payload, trace_id=None) — emit intent_schema for an inbound L1 chat turn with an intent block.
- emit_user_prompt(prompt, conversation_id, token_payload, trace_id=None, intent_class=None) — emit user_prompt for an inbound L1 chat turn.
- emit_llm_turn(trace_id, conversation_id, token_payload, request_messages, response_content, model, ...) — emit llm_turn for each oracle LLM cycle (L2-only).
- emit_tool_output(trace_id, conversation_id, token_payload, service_name, tool_name, arguments, output, latency_ms=None) — emit tool_output after a successful MCP dispatch (oracle observes on L3's behalf); strips injected apollo_guidance from arguments.
- emit_tool_error(trace_id, conversation_id, token_payload, service_name, tool_name, arguments, error_message, ...) — emit tool_error after a failed MCP dispatch.
- emit_final_response(trace_id, conversation_id, token_payload, response, ...) — emit final_response just before oracle returns to L1.
apollo/learner/
apollo/learner/__init__.py — Package marker; extractors update Decision Graphs deterministically on every observation (M2); synthesis LLM runs event-driven (M8). No re-exports.
apollo/learner/coalescer.py — Layer 6-C — periodic background loop that finds similarity clusters of active artifacts and queues LLM-merged proposals on apollo_proposals. Off by default (APOLLO_COALESCER_ENABLED); bounded by APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN.
- run_periodic(stop_event) — async loop; honors kill switch + APOLLO_COALESCER_INTERVAL_SEC.
- run_sweep_once() — one pass; returns {clusters_found, proposals_emitted, skipped}. Never raises.
apollo/learner/drift.py — Graph-anchor drift check (M8). Four deterministic sub-checks gate every LLM-produced proposal before it becomes eligible for Curator commit.
- CheckResult, DriftCheckResult — dataclasses; DriftCheckResult.approved is True iff all sub-checks passed; as_drift_event() shapes a DriftEvent body.
- check_proposed_pattern_vs_edges(proposal, outcome_graph_edges) — proposed FailurePattern must match an observed error edge.
- check_intent_classification_vs_clusters(proposal, intent_graph_nodes) — proposed IntentPattern must reference an existing intent cluster.
- check_weight_swings(proposal, existing_weights, z_threshold=None) — proposed weight must lie within APOLLO_DRIFT_Z_SCORE stdevs of prior distribution.
- check_trajectory_coherence(proposal, trajectory, tolerance=None) — proposed weight's direction of change must align with the reference EWMA trajectory.
- run_all(proposal_id, proposal, outcome_graph_edges=None, ...) — aggregate every sub-check into a DriftCheckResult.
apollo/learner/extractors.py — Rule-based observation → Decision Graph mutations. Deterministic; one entry per event type. Service-namespaces every naturally-service-scoped label.
- apply(envelope, graph_set) — dispatch the envelope across every extractor; per-extractor exceptions logged and swallowed so a buggy extractor can't block the others.
apollo/learner/graphs.py — In-memory state of the five Decision Graphs plus per-trace scratchpad. Idempotent per-trace — re-observing the same trace_id is a no-op against the same node/edge.
- Module constants INTENT_TOOL_GRAPH, PROMPT_SHAPE_GRAPH, SERVICE_ROUTING_GRAPH, OUTCOME_GRAPH, ITERATION_GRAPH, ALL_GRAPH_IDS.
- Node, Edge — pure dataclasses with EWMA weights on edges and outcome distributions on nodes.
- make_node_id(graph_id, kind, label), make_edge_id(graph_id, source_id, target_id) — deterministic SHA1-based ids.
- DecisionGraph — upsert_node(kind, label, trace_id, at, outcome=None, tags=None), upsert_edge(source_id, target_id, trace_id, at, success), drain_dirty() (clears the dirty sets), reset().
- GraphSet — owns the five graphs and the per-trace scratchpad; graph(graph_id), all_nodes(), all_edges(), trace_scratch(trace_id) (lazy TTL eviction), drain_all_dirty(), load_from_records(nodes, edges), reset().
apollo/learner/prompts.py — Prompt templates for the synthesis LLM (M8). One build_*_prompt per flavor; every template demands strict JSON output with a documented schema. Every per-type schema requires a top-level confidence: 0.0..1.0 field; the shared _SHARED_RULES block documents its semantics.
- build_failure_pattern_prompt(observations, subgraph) — propose a FailurePattern from tool_error observations.
- build_intent_pattern_prompt(observations, subgraph) — propose an IntentPattern from L1 prompts/intent schemas.
- build_prompt_shim_prompt(intent_class, pain_points, subgraph) — propose a PromptShim that improves agent prompts for a given intent.
- build_sweep_prompt(service, intent_class, observations, active_artifacts, subgraph) — continuous-sweep prompt that may return NoProposal when no useful signal.
- build_coalesce_prompt(*, artifact_type, cluster) — Layer 6-C merger prompt: ask the LLM to write a single replacement artifact covering every cluster member's intent without redundancy.
apollo/learner/similarity.py — Layer 6-A + 6-B — artifact embedding + cosine similarity helpers. Pluggable embedder (defaults to axonis.memory.embedder); gracefully degrades to no embedding when sentence-transformers is unavailable.
- set_embedder(fn), reset_embedder(), get_embedder() — pluggable embedder interface for tests / production.
- text_for_embedding(content, artifact_type) — type-aware text extraction (PromptShim / FailurePattern / IntentPattern / ToolPairingHint / SpecFragment / ServiceConnectionHint).
- compute_embedding(content, artifact_type) — embed; returns None when text is empty or the embedder is unavailable.
- cosine_similarity(a, b) — pure-Python cosine; safe on empty / zero-magnitude / mismatched inputs.
- find_similar_active_artifacts(*, proposal_embedding, proposal_type, proposal_applicability, active_set, threshold=None, self_artifact_id=None) — scope-filtered similarity scan, returned sorted desc by similarity.
apollo/learner/snapshots.py — Hourly graph-state snapshots (§Learner → Snapshots and trajectory). Snapshots are the substrate for past-vs-current comparison and admin graph rollback.
- service_from_label(label) — recover the emitter service from a namespaced graph-node label; returns "_all" for unprefixed/universal labels.
- set_snapshot_writer(writer) — install a pluggable persistence writer (tests skip ES).
- build_snapshot(graph, at, tier="hourly") — capture one graph's full in-memory state as a snapshot document.
- snapshot_once(graph_set) — build + persist one snapshot per graph; emits an audit line.
- rollback_to_snapshot(graph_id, snapshot_id) — restore prior graph state from a snapshot; M11 minimal implementation, returns bool.
- run_periodic(graph_set, stop_event) — long-running coroutine; snapshots every APOLLO_GRAPH_SNAPSHOT_INTERVAL seconds with prompt-shutdown bounded waits.
apollo/learner/synthesis.py — Event-driven synthesis dispatcher (M8). Bridges observations to artifact proposals via LLM calls, coalesced per-trace, bounded by a semaphore, and gated by the drift check before reaching the pending list.
- ProposalRecord — dataclass for one synthesis outcome (approved or drift_flagged); to_public().
- SynthesisEngine — singleton; set_graph_getter(getter), schedule(envelope) (event-driven entry), schedule_admin_initiated(scope) (POST /learn entry), pending_snapshot() (merges ES + in-memory), clear_pending(), remove_pending(proposal_id), run_sweep_once() (continuous sweep tick).
- run_sweep_periodic(stop_event) — background loop driving run_sweep_once() on APOLLO_SYNTHESIS_SWEEP_INTERVAL_SEC cadence; gated by APOLLO_SYNTHESIS_SWEEP_ENABLED.
- _NEUTRAL_CONFIDENCE = 0.5 — module-level neutral default; applied when an LLM proposal is missing or has an unparseable confidence.
- _normalize_confidence(proposal) — coerce and clamp proposal['confidence'] into [0.0, 1.0]; mutates in place; called from _record_proposal.
apollo/learner/trajectory.py — Per-edge short-vs-long EWMA divergence projection — the primary drift signal. Pure math over the in-memory graph.
- EdgeTrajectory — dataclass (edge_id, source/target ids, weight_short/long, divergence, count).
- project(graph) — return a per-edge trajectory list sorted by abs(divergence) descending.
apollo/lineage/
apollo/lineage/__init__.py — Package marker (§7.3); re-exports persist_attach, persist_capped, aggregate_artifact_stats, query_capped_for_artifact, query_traces_with_artifact, query_trace_attribution, _persistence_disabled. Durability layer for cross-trace attribution complementing the in-memory registry.
apollo/lineage/persist.py — Schedules fire-and-forget ES writes of attach + cap events to apollo_lineage_events. Stays off the attach latency budget; no-ops under pytest or when no event loop is running.
- persist_attach(trace_id, scope, artifact_ids) — schedule one denormalized row per (trace_id, scope, artifact_id) with kind: "attached"; idempotent via deterministic uid {trace_id}:{scope}:{aid}.
- persist_capped(*, trace_id, scope, capped) — schedule one row per (artifact_id, artifact_type) pair with kind: "capped" and capped: uid-prefix so it coexists with attached rows on the same triple.
apollo/lineage/queries.py — Retroactive lineage reads against apollo_lineage_events. The admin /lineage endpoint merges these with the in-memory AttributionRegistry. The attached-only queries filter out kind: "capped" rows so "applied" semantics are preserved.
- query_traces_with_artifact(artifact_id, service_name=None, limit=500) — every persisted trace where the artifact was applied (excludes capped); deduplicates scopes per trace.
- query_capped_for_artifact(artifact_id, *, service_name=None, limit=500) — every persisted trace where the cap held the artifact back; same shape as the attached query but kind=capped only.
- query_trace_attribution(trace_id, service_name=None, limit=500) — full persisted attribution for one trace (excludes capped); returns None when no rows.
- aggregate_artifact_stats(artifact_id, *, since=None, limit=1000) — {attached_count, capped_count, last_attached_at, last_capped_at} aggregate; powers the GET /artifacts/{id}/stats admin endpoint.
apollo/memory/
apollo/memory/__init__.py — Package marker; Elastic UDS-backed storage for observations, artifacts, graphs, and audit. No re-exports.
apollo/memory/bootstrap.py — Idempotent index bootstrap. ES auto-creates on first write but reads against a missing index 404; this runs at startup so reads on a fresh cluster don't 404.
- ensure_indices() — create every Apollo index from apollo/templates/* that doesn't yet exist; returns {alias: status}. Handles multi-worker race via resource_already_exists_exception swallowing.
apollo/memory/queries.py — Bypasses for UDS.read (UDS routes through an on-disk template Apollo doesn't ship). Goes straight to the underlying ES client; falls through to store.read(...) for in-memory test fakes.
- scan_all(store, size=10000) — enumerate every doc; 404 → empty dict.
- get_by_id(store, uid) — single-doc fetch; returns {uid: doc} or {}.
- search_by_query(store, query, size=1000) — filtered ES-shape query; 404 → empty dict.
apollo/memory/store.py — UDS subclasses for every Apollo Elastic index. Each owns a single alias registered in axonis-core's schema (except ApolloProposals, which is hardcoded to avoid a schema bump).
- ApolloObservations, ApolloGraphNodes, ApolloGraphEdges, ApolloGraphSnapshots — observation and decision-graph stores (M1, M2).
- ApolloArtifacts, ApolloArtifactHistory, ApolloAudit — Curator stores (M9). Artifacts holds current versions; history holds prior versions indefinitely; audit defaults to APOLLO_AUDIT_RETENTION_DAYS with indefinite=true records exempt.
- ApolloLineageEvents — denormalized cross-trace attribution events (§7.3).
- ApolloProposals — persistent synthesis pending list (M8) so multi-worker uvicorn deployments don't strand proposals per-worker. Overrides _index to avoid KeyError.
apollo/observer/
apollo/observer/__init__.py — Package marker; observation normalization and ingest entry point. No re-exports.
apollo/observer/events.py — Observation envelope and typed event-payload models. §Observation Model; token-level events are rejected — only turn-boundary/tool-invocation/error/final-response events accepted.
- EventType(Enum) — full enum (must match §Observation Model → Event types exactly).
- IntentSchemaPayload, UserPromptPayload, LLMTurnPayload, ToolOutputPayload, ToolErrorPayload, FinalResponsePayload, UserFeedbackPayload — per-event-type payload models (extra=allow).
- CallerIdentity — minimal identity from token payload (username, roles, service); records "who the work is attributed to."
- EmittedBy — server-stamped attribution of "who pushed the bytes" (token_subject, token_roles, context); auditors flag divergence from caller_identity.username.
- ObservationEnvelope — unified envelope with field-validator that re-validates payload against the event-type model.
- ObservationBatch — HTTP POST body shape for /observations.
apollo/observer/ingest.py — Async-queue-backed ingest loop (M1). Both in-process and HTTP intake converge here; workers drain the queue, dedup, write to ES, update graphs, and trigger synthesis + evaluator.
- Module-level graph_set: GraphSet — the in-memory mirror of the five Decision Graphs.
- startup(), shutdown() — lifecycle entry/exit called from oracle's Starlette lifespan.
- reset_state() — testing helper.
- set_writer(writer), set_graph_writer(writer) — install test writers.
- ingest(envelope) — public entry point; put_nowait onto the bounded queue, returns False on queue-full; never blocks emitter tasks.
apollo/sdk/
apollo/sdk/__init__.py — Compat shim re-exporting the canonical SDK from axonis.apollo (ApolloClient, ApolloGuidanceCache, ApolloIntegration, ApolloMCPMiddleware, current_guidance). M5 moved the SDK into axonis-core; this re-export keeps oracle-internal imports working unchanged.
Other Apollo files in oracle/
server/middleware/trace.py — W3C traceparent ingress middleware (M6). Reads / mints / validates traceparent on every inbound request, installs it on the ambient ContextVar so downstream emitters and outbound dispatches propagate the same trace-id. Skips /health and /service-info.
- TraceparentMiddleware — ASGI middleware; constructor takes optional header_name override; in best-effort mode mints replacements and increments missing/malformed counters; in required mode rejects with 400.
server/llm/apollo_cache.py — Request-scoped Apollo guidance cache for oracle's chat LLM (L2). Oracle is a guidance subscriber for its own chat LLM, same as L1/L3; isolated per-request via ContextVar.
- get_cache() — return the current request's ApolloGuidanceCache, or the empty sentinel.
- set_cache(cache) — install a cache; returns a ContextVar token.
- reset_cache(token) — restore the prior contextvar binding; idempotent.
- populate_for_turn(user, intent_class, caller_tags, trace_id) — build a cache from the L2 attacher and install it; swallow-on-failure, returns the token.
Insomnia Test Flow
A literal, step-by-step protocol for exercising every Apollo HTTP endpoint via the Insomnia collection (developers-environment/Insomnia/APOLLO-API.yaml + AXONIS-Oracle.yaml for the /chat step). Doubles as the canonical request/response-contract reference for each endpoint: per step it pins the verb, URL, body/params, and the response shape to verify.
Format per step. Each numbered step lists exactly one request: which folder/name to fire, body / params (verbatim), and what to verify in the response.
Coverage. Every Apollo HTTP endpoint is exercised at least once. The full coverage table is at the end.
Setup
Pick a sub-environment in Insomnia (localhost - test, development - test, etc.). OAuth2 auto-applies the Bearer token to every request below.
Three env variables propagate across steps — update in the active sub-env when prompted:
| Variable | Set in step | Read in steps |
|---|---|---|
demo_memory_uid |
4 | 5, 7, 8 |
demo_artifact_id |
12 | 13, 14, 15, 16, 17, 19, 32, 33, 44 |
demo_longevity_prefix |
36 | 37, 43 |
Local longevity simulation (steps 36, 37, 43). The longevity-seed / verify / cleanup endpoints are gated behind APOLLO_ALLOW_LONGEVITY_SEED=true on the oracle process. The flag is exported by developers-environment/oracle/apollo.env, so sourcing the standard dev env stack already enables it:
source ../developers-environment/conf/development.axonis.ai.env
source ../developers-environment/oracle/oracle.env
source ../developers-environment/oracle/apollo.env
uv run python -m server
Override to false (or unset the var) on shared / production clusters — both endpoints return 403 with that hint in detail when the flag is off.
Steps 1–10 — stats, ingest, memory CRUD, learn
1. Stats → Stats — GET /api/v1/apollo/stats. Verify: 200; body status == "ok"; metrics block present; maintenance.last_run_at present.
2. Observation Ingest → Post Observations (batch) — POST /api/v1/apollo/observations. Body (replace default to seed both event types):
{
"observations": [
{"event_type": "tool_output", "trace_id": "trc_demo_0001", "service": "cortex",
"timestamp": "2026-05-12T15:00:00Z",
"payload": {"tool_name": "summarize", "latency_ms": 312.4, "output_size_bytes": 1024}},
{"event_type": "tool_error", "trace_id": "trc_demo_0002", "service": "cortex",
"timestamp": "2026-05-12T15:00:01Z",
"payload": {"tool_name": "summarize", "error_class": "ValidationError",
"error_message": "schema mismatch on input.cohort_id"}}
]
}
Verify: 202; body {"accepted": 2, "dropped": 0}.
3. Memories → List Memories — GET /api/v1/apollo/memories?service=cortex&limit=5. Verify: 200; both records from step 2 in observations[]; count >= 2.
4. Capture demo_memory_uid — Copy any observations[i]._id from the step 3 response into the sub-environment's demo_memory_uid.
5. Memories → Get Memory — GET /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Verify: 200; same record as in step 3.
6. Memories → Seed Memory — POST /api/v1/apollo/memories. Body:
{"event_type": "tool_output", "trace_id": "trc_admin_seed_0001", "service": "oracle",
"timestamp": null, "payload": {"tool_name": "synthetic_seed", "output_size_bytes": 0}}
Verify: 201; body {"accepted": true, "trace_id": "trc_admin_seed_0001", "event_type": "tool_output"}.
7. Memories → Patch Memory — PATCH /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Body:
{"tags": ["under-review", "seeded-by-admin"], "admin_note": "Flagged during flow test."}
Verify: 200.
8. Memories → Delete Memory — DELETE /api/v1/apollo/memories/{{ _.demo_memory_uid }}. Verify: 200; body {"forgotten": true, "uid": "<demo_memory_uid>"}.
9. Memories → List Memories — attribution filters — GET /api/v1/apollo/memories?caller_username=test@axonis.ai&limit=10. Enable the caller_username query param. Verify: 200; every observation's caller_identity.username == "test@axonis.ai".
10. Learn → Trigger Synthesis — POST /api/v1/apollo/learn. Body:
{"intent_class": "entity_resolution", "service_name": "cortex",
"note": "Probing the recent tool_error burst on cortex/summarize."}
Verify: 202; body {"accepted": true, "scope": {...}}. Wait ~2 seconds for the background synthesis task.
Steps 11–20 — artifacts, curator lifecycle, audit, divergence
11. Artifacts → List Artifacts — GET /api/v1/apollo/artifacts. Verify: 200; body {active: [...], pending: [...], count: {active, pending}}; at least one pending entry with status == "approved".
12. Capture demo_artifact_id — Copy any pending[i].id (where pending[i].status == "approved") into demo_artifact_id.
13. Artifacts → Promote Artifact — POST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/promote. Body:
{"proposal_id": "<paste the same demo_artifact_id value>",
"rationale": "Flow-test promote.", "admin_note": "step 13"}
Verify: 200; body is an ActionResult: action: "promote", version: 1, non-null audit_record_id, before_version_id: null, after_version_id: "<id>:v1".
14. Artifacts → Edit Artifact — PATCH /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}. Body:
{"rationale": "Tighten applicability to cortex/summarize.",
"applicability_patch": {"service_name": "cortex", "tool_name": "summarize"}}
Verify: 200; ActionResult with action: "edit", version: 2, before_version_id: "<id>:v1", after_version_id: "<id>:v2".
15. Artifacts → Rollback Artifact — POST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/rollback. Body:
{"target_version": 1, "rationale": "Flow-test rollback to v1."}
Verify: 200; ActionResult with action: "rollback", version: 3, before_version_id: "<id>:v2", after_version_id: "<id>:v3". Underlying artifact content matches v1.
16. Artifacts → Demote Artifact — POST /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/demote. Body:
{"rationale": "Flow-test demote.", "evaluator_score": null,
"score_decomposition": null, "upstream_artifact_ids": []}
Verify: 200; ActionResult with action: "demote", version: 4, after_version_id: "<id>:v4". Underlying doc's status flips to "demoted".
17. Recommendations → List Recommendations — GET /api/v1/apollo/recommendations. Verify: 200; body {recommendations: [...], count}. May be empty pre-Evaluator signal.
18. Audit → List Audit Records — GET /api/v1/apollo/audit?artifact_id={{ _.demo_artifact_id }}&limit=20. Enable the artifact_id filter. Verify: 200; records[] contains four entries in order: promote, edit, rollback, demote, all with actor == "admin:test@axonis.ai".
19. Provenance → Trace Artifact Provenance — GET /api/v1/apollo/provenance?artifact_id={{ _.demo_artifact_id }}. Verify: 200; body {artifact, audit, proposal, contributing_observations, contributing_services, note}. audit[0].action == "promote". proposal.trigger_event_type is one of user_prompt / tool_error / tool_output / sweep / admin_initiated.
20. Divergence → List Divergence — GET /api/v1/apollo/divergence. Verify: 200; body {records: [...], count, note}. May be empty if no service-on-behalf-of-user emits have occurred yet. note mentions service principals.
Steps 21–31 — lineage, guidance preview, SSE, admin chat, L1 chat
21. Lineage → Lineage by Artifact — GET /api/v1/apollo/lineage?artifact_id={{ _.demo_artifact_id }}&include_observations=true&observations_limit=50. Verify: 200; body {lineage, count, filter, note}. Each entry has source ∈ {live, persisted, live+persisted}.
22. Lineage → Lineage by Trace — GET /api/v1/apollo/lineage?trace_id=trc_demo_0001&include_observations=true. Verify: 200; one entry with trace_id == "trc_demo_0001", applied field present.
23. Guidance Preview → Preview L1 Guidance — GET /api/v1/apollo/guidance?scope=l1. Verify: 200; body {scope: "l1", guidance: <AttachedGuidance dict or null>}.
24. Guidance Preview → Preview L3 Guidance (cortex) — GET /api/v1/apollo/guidance?scope=l3:cortex. Verify: 200; body {scope: "l3:cortex", guidance: <AttachedGuidance dict or null>}.
25. Admin SSE → List Subscribers — GET /api/v1/apollo/subscribers. Verify: 200; body {subscribers: [...], count}. Likely empty until step 26 connects.
26. Admin SSE → Subscribe to Guidance Stream — GET /api/v1/apollo/guidance/stream?scope=*. Open in a separate tab; Insomnia keeps the SSE response open. Verify: Stream stays connected. Re-running step 25 in another tab now lists this subscriber.
27. Admin Chat → List Tools — POST /api/v1/apollo/chat. Body: {"action": "list_tools"}. Verify: 200; body {tools: [...], milestone, mode}. Each tool entry carries name, description, parameters, mutating.
28. Admin Chat → Invoke Tool (direct) — POST /api/v1/apollo/chat. Body:
{"action": "invoke", "tool": "list_decisions", "arguments": {"action": "demote", "limit": 20}}
Verify: 200; response carries the tool's raw return value (LLM bypassed).
29. Admin Chat → Chat (buffered) — POST /api/v1/apollo/chat. Body:
{"action": "chat", "message": "Why was {{ _.demo_artifact_id }} demoted?",
"conversation_id": "conv-flow-test-1"}
Verify: 200; body {response, tool_calls, iterations, conversation_id, timeout?}. The LLM should pick a tool that surfaces the audit row.
30. Admin Chat → Chat Stream (SSE) — POST /api/v1/apollo/chat/stream. Body:
{"message": "List recent demote decisions.", "conversation_id": "conv-flow-test-stream"}
Verify: SSE events fire in order: session_start → tool_call → tool_result → response_delta* → response → done. error replaces the trailing pair on LLM failure.
31. AXONIS-Oracle → Oracle Gateway → Chat / LLM → Chat — POST /api/v1/chat. Body: default body in collection (message + intent_schema). Verify: 200; response includes apollo_guidance field on the envelope (null when no active artifacts match L1 scope, dict otherwise).
Steps 32–44 — capped lineage, stats aggregate, ops, effectiveness, cleanup
32. Lineage → Capped Lineage by Artifact — GET /api/v1/apollo/lineage/capped?artifact_id={{ _.demo_artifact_id }}&limit=200. Verify: 200; body {artifact_id, service_name_filter, records, count, note}. Each record carries {trace_id, scopes, registered_at}. Likely empty for a freshly-promoted demo artifact — cap pressure builds with real traffic.
33. Artifacts → Artifact Stats Aggregate — GET /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}/stats. Verify: 200; body {artifact_id, attached_count, capped_count, last_attached_at, last_capped_at}. Numbers reflect the last 1000 lineage events for this artifact. Add ?since=2026-05-01T00:00:00Z to narrow the window.
34. Ops → Migrate Lineage Mapping — POST /api/v1/apollo/ops/migrate-lineage-mapping?dry_run=true. Verify: 200; body {dry_run: true, index, already_present: [...], would_add: [...]}. Re-fire without dry_run to perform the additive PUT _mapping that adds artifact_type + kind to the existing apollo_lineage_events index. Idempotent. Run this once per environment that pre-dates Layer 6.
35. Ops → Re-embed Artifacts — POST /api/v1/apollo/ops/reembed-artifacts?dry_run=true&limit=1000. Verify: 200; body {dry_run, processed, embedded, skipped_already_have, skipped_no_text, skipped_no_embedder, errors, updated_ids}. Re-fire with dry_run=false to write content.embedding_vector back to active artifacts that pre-date Layer 6-A. Idempotent. When sentence-transformers isn't installed every artifact lands in skipped_no_embedder.
36. Ops → Longevity Seed — POST /api/v1/apollo/ops/longevity-seed?dry_run=true&days=30&observations=2000&proposals=60&audit=60&lineage=1500&artifacts=40. Prereq: APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {dry_run, run_prefix, since, until, days, observations:{...}, proposals:{...}, audit:{...}, lineage:{attached,capped}, artifacts:{...}, cleanup_hint}. Re-fire with dry_run=false to write the backdated synthetic data. Copy run_prefix into _.demo_longevity_prefix for steps 37 and 43.
37. Ops → Longevity Verify — GET /api/v1/apollo/ops/longevity-verify?run_prefix={{ _.demo_longevity_prefix }}. Prereq: same APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {run_prefix, total_docs, docs_by_index}. docs_by_index totals should equal the planned counts in step 36's response. A mismatch indicates a per-store write failure.
38. Effectiveness → Summary (1d) — GET /api/v1/apollo/effectiveness/summary?window=1d. Verify: 200; smallest of the four window snapshots.
39. Effectiveness → Summary (7d) — GET /api/v1/apollo/effectiveness/summary?window=7d. Verify: 200; each section's count ≥ step 38's matching count.
40. Effectiveness → Summary (30d) — GET /api/v1/apollo/effectiveness/summary?window=30d. Verify: 200; primary read. After step 36's seed, every section shows the seeded counts as a floor (never assert equality, only floor). One section per surface — observations, synthesis, curator, attach, artifacts, evaluator — plus the resolved window, since, until. Supports window=1d|7d|30d|90d, or override with since=ISO8601 (+ optional until).
41. Effectiveness → Summary (90d) — GET /api/v1/apollo/effectiveness/summary?window=90d. Verify: 200; widest of the four window snapshots. With a 30-day seed this matches step 40's counts.
42. Effectiveness → Trend (1d/7d/30d/90d) — GET /api/v1/apollo/effectiveness/trend?windows=1d,7d,30d,90d. Verify: 200; body {buckets, skipped, as_of, rollups:{1d, 7d, 30d, 90d}}. Each entry in rollups carries the same shape as /effectiveness/summary. Healthy Apollo has observations.total / synthesis.proposals_created / curator.total_actions / attach.attached_total monotonically non-decreasing as windows widen.
43. Ops → Longevity Cleanup — POST /api/v1/apollo/ops/longevity-cleanup?run_prefix={{ _.demo_longevity_prefix }}. Prereq: same APOLLO_ALLOW_LONGEVITY_SEED=true. Verify: 200; body {run_prefix, total_deleted, deleted_by_index}. Re-firing on an already-cleaned prefix returns total_deleted: 0. After cleanup, re-fire step 37 — it should report total_docs: 0.
44. Artifacts → Forget Artifact — cleanup — DELETE /api/v1/apollo/artifacts/{{ _.demo_artifact_id }}. Body:
{"rationale": "Flow-test cleanup — irreversible delete."}
Verify: 200; subsequent GET /artifacts no longer surfaces the id in either active or pending.
Coverage check
Every endpoint in apollo/admin/api.py + apollo/guidance/api.py + apollo/chat/server.py is exercised at least once:
| Endpoint | Step |
|---|---|
GET /stats |
1 |
POST /observations |
2 |
GET /memories |
3, 9 |
GET /memories/{uid} |
5 |
POST /memories |
6 |
PATCH /memories/{uid} |
7 |
DELETE /memories/{uid} |
8 |
POST /learn |
10 |
GET /artifacts |
11 |
POST /artifacts/{id}/promote |
13 |
PATCH /artifacts/{id} |
14 |
POST /artifacts/{id}/rollback |
15 |
POST /artifacts/{id}/demote |
16 |
GET /recommendations |
17 |
GET /audit |
18 |
GET /provenance |
19 |
GET /divergence |
20 |
GET /lineage |
21, 22 |
GET /guidance |
23, 24 |
GET /subscribers |
25 |
GET /guidance/stream |
26 |
POST /chat (3 actions) |
27, 28, 29 |
POST /chat/stream |
30 |
POST /api/v1/chat (L1 attach) |
31 |
GET /lineage/capped |
32 |
GET /artifacts/{id}/stats |
33 |
POST /ops/migrate-lineage-mapping |
34 |
POST /ops/reembed-artifacts |
35 |
POST /ops/longevity-seed |
36 |
GET /ops/longevity-verify |
37 |
GET /effectiveness/summary |
38, 39, 40, 41 |
GET /effectiveness/trend |
42 |
POST /ops/longevity-cleanup |
43 |
DELETE /artifacts/{id} |
44 |
Auth-failure smoke check (optional)
For each endpoint above, the OAuth token is auto-applied. To verify the auth gate: in a separate tab, remove the Authorization header and re-fire any admin endpoint. Expect 401 Unauthorized (missing/invalid token) or 403 Forbidden (token lacks atlasfl-admin role).
Failure modes
| Symptom | Diagnosis |
|---|---|
401 on every step |
Token expired — re-auth via Insomnia's OAuth panel. |
403 forbidden on mutations |
Caller's token lacks atlasfl-admin. |
403 curator_paused |
Curator is paused. POST a resume via step 27/28 first. |
400 on step 2 |
Envelope failed schema validation; detail lists the bad fields. |
400 APOLLO_REQUIRE_INTENT_SCHEMA on step 31 |
Required-mode flip is on (§Q12). Add an intent_schema block to the body. |
503 on step 31 |
No LLM provider configured. Set ANTHROPIC_API_KEY / OPENAI_API_KEY / GROQ_API_KEY. |
Empty pending: [] after step 10 |
Synthesis produced no proposals. Push more observations (step 2) and retry. |
404 on step 13 |
demo_artifact_id doesn't match any pending proposal. Re-run step 11 and re-capture. |
409 on step 13 |
Proposal status isn't approved (likely drift_flagged). Pick a different pending entry. |
Future Improvements & Considerations
A living backlog of deferred work, hygiene items, and design considerations that came up during the M0–M14 build but were not addressed immediately. Each entry includes the context, the proposed change, rough effort, and priority so a future engineer can pick an item up without re-discovering why it exists.
Beacon (L1) wiring is deferred — beacon has no HTTP connection to oracle today (MCP_SERVER_URL defaults to cortex direct, see §2.1 below), so attached apollo_guidance has no path into beacon's process. Tracked as item §2.3 below. Parallax was originally a Phase 1 subscriber but is also deferred; its wiring follows the cortex pattern when it onboards.
What this is NOT. This is not a bug list — every item here is either (a) deliberately deferred with a design rationale, (b) hygiene that ships-ready code can tolerate, or (c) an enhancement waiting on production signal. Anything urgent should be an issue, not an entry here.
Cross-references. Items marked "SPEC-OOS" are already tracked in §Implementation Plan → Out of scope; they're restated here with operational framing.
Legend
| Priority | Meaning |
|---|---|
| P1 — Soon | Would meaningfully improve ops/DX; pick up when a relevant milestone lands nearby |
| P2 — Watch | Not blocking today; flip to P1 if production data shows pressure |
| P3 — Future phase | Tracked for spec completeness; waits on prerequisites outside Apollo's scope |
| Effort | Meaning |
|---|---|
| S | < half day |
| M | 1–2 days |
| L | multi-day |
Currently active
Quick-reference of items that have not shipped. Items marked ✅ in the body below are historical and grouped under §Changelog (Layer items in §12).
| Item | Priority | Effort | One-line |
|---|---|---|---|
§2.1 — Beacon default MCP_SERVER_URL bypasses oracle |
P2 | S/M | Config docs (S) or change default (M) |
| §2.3 — Beacon (L1) onboarding to oracle's chat surface | P1 | M | First L1 caller wiring; gates §4.1's required-mode flip |
| §2.4 — Distribution model for subscriber-facing axonis-core modules | P2 | S | Decide: pip extra vs always-installed |
| §2.5 — Additional L3 emitter onboarding | P2 | M | Parallax / prism / sentinel after the cortex pattern |
§3.1 — Production-grade minimax-local provider |
P2 | mixed | Knobs 1-3 shipped; knobs 4-5 deferred |
§4.1 — Flip APOLLO_REQUIRE_INTENT_SCHEMA=true |
P2 | S | Needs §2.3 (beacon L1) + emitter coverage proof |
§4.2 — Flip APOLLO_REQUIRE_TRACEPARENT=true |
P2 | S | Same gating as §4.1 |
| §8.1 — Keycloak client-credentials grant | P2 | M | Replace pre-populated AUTHORIZATION with proper grant |
| §10.1 — Design-journey screenshot / gif | P3 | S | Presentation polish |
| §11.1 — Absorb oracle's existing memory modules into Apollo | P3 | — | Effectively resolved organically (see body) |
| §12.9 — Cap-defaults empirical study | 🚧 In flight | — | Wait for telemetry; not code work |
2 · Service integration
2.1 — Beacon's default MCP_SERVER_URL bypasses oracle
Priority: P2 — Watch · Effort: S (config doc); M (change the default).
Context. developers-environment/beacon/beacon.env ships with MCP_SERVER_URL=http://localhost:8000/mcp — pointing at cortex directly. In that configuration beacon never reaches oracle; Apollo sees zero traffic.
For the Apollo scenario to work from beacon (not from curl), the operator must override to http://localhost:8001/agentspace/mcp.
Proposed change. Two options: (1) Doc-only (current path) — §Scenario calls this out; (2) Change the default — flip beacon.env's default to oracle's MCP. Option 2 is the right production choice but a breaking change for anyone running beacon alongside cortex without oracle. Left at option 1 until a production deployment exercises this end-to-end.
Unblocks. Beacon's /chat UI becomes the natural demo surface for Apollo without per-deployment config gymnastics.
2.3 — Beacon (L1) onboarding to oracle's chat surface
Priority: P1 — Soon · Effort: L (architecture decision + implementation).
Direction. Oracle's POST /api/v1/chat is the production user-facing chat surface, driven by oracle's own LLM tool-use loop. It is the L1 surface for any client that wants Apollo guidance applied automatically. Beacon currently streams direct to upstream LLM providers and does not call oracle. Closing the loop on the L1 attach side requires routing beacon through /api/v1/chat — at which point beacon receives apollo_guidance in every response body and feeds its local ApolloGuidanceCache.update(...), exactly as the L1 contract from M3 + M5 specifies.
POST /api/v1/apollo/chat remains a separate admin-scoped surface that runs Apollo's independent MiniMax LLM for talking to Apollo. It is not the L1 path and is not on a deprecation track.
Open implementation details:
1. Beacon's transport. (a) beacon's backend becomes a thin streaming proxy to /api/v1/chat so beacon's existing chat UI keeps working with no UX change; or (b) keep beacon's direct-to-LLM path and have beacon pull guidance via a separate endpoint (GET /api/v1/apollo/guidance?scope=l1, currently admin-only — would need its own role relaxation). (a) is the simpler model.
2. Streaming. /api/v1/chat returns a buffered ChatResponse today; production beacon UX needs streaming.
3. Conversation persistence. Already wired on /api/v1/chat via oracle/server/memory/conversation.py (Redis-backed). Beacon onboarding inherits it for free.
4. Tool catalog visibility. Decide whether tool_calls should be exposed in the streaming response so beacon can render them.
Trigger. First production deployment that needs beacon to consume Apollo guidance via L1 attach.
2.4 — Distribution model for subscriber-facing axonis-core modules
Status: ⏸ On hold pending trigger. Resolved for cortex: it depends on axonis-core>=0.1.0 directly and imports ApolloGuidanceCache from the canonical post-flatten path (axonis.apollo.guidance_cache).
What's open — neither question has a consumer pushing on it today:
1. Split a thin axonis-sdk sub-package? ApolloGuidanceCache and Spec (formerly LLMSpec) are pure-stdlib. A thin package would let truly-isolated agents depend on a small surface without taking the full axonis-core.
2. Or move axonis-core's heavy deps to optional extras? axonis-core[elastic], axonis-core[transport], etc. Bigger refactor; affects every existing consumer.
Trigger to revisit: parallax or beacon cannot take on axonis-core, exercising Q15's vendoring branch, or a third pure-stdlib SDK module joins the family. Neither has happened.
2.5 — Additional L3 emitter onboarding
Status: ⏸ Owned by individual service teams — each service declares its own component_kind and (if needed) installs ApolloMCPMiddleware. No Apollo code change required.
Per-service onboarding state (audited 2026-05-12):
| Service | apollo refs in source | Path to onboard |
|---|---|---|
parallax |
1 file | Closest to ready. Same shape as cortex — install ApolloMCPMiddleware in its __main__.py and declare component_kind="agent". |
athena |
0 files | Library-kind. Declare component_kind="library" — Apollo filters it out of guidance attach. Oracle observes the MCP round-trip and emits on its behalf. |
testament |
0 files | Same as athena — likely library-kind. Owning team confirms. |
titan |
0 files | Same — owning team confirms component_kind. |
UDS, rest/fedai-rest |
n/a | Library-kind by design. |
Integration paths (unchanged from §Ingest Semantics):
- In-process relay (default) — service is reachable from oracle's MCP dispatch. No code change in the service; just declare component_kind.
- Direct POST via ApolloClient (fallback) — service runs outside oracle's MCP dispatch reach. Service imports ApolloClient and POSTs to /api/v1/apollo/observations.
Trigger. Each service's owning team picks this up when they're ready. Apollo's side requires no work.
3 · LLM provider hardening
3.1 — Production-grade minimax-local provider
Status: Knobs 1–3 shipped 2026-05-12; knobs 4–5 still deferred (no consumer pushing on them — production uses APOLLO_LLM_PROVIDER=openai with APOLLO_LLM_BASE_URL pointed at a hosted MiniMax endpoint; the local provider is a dev / air-gapped-lab fallback). M8 shipped the canonical HF load signature; recent additions are backward-compatible.
| # | Knob | Status | Env override / notes |
|---|---|---|---|
| 1 | Custom model path | ✅ Shipped | APOLLO_LLM_LOCAL_MODEL_PATH — absolute path on a mounted shared fs. Empty → HF cache default. |
| 2 | Thread-pool offload of the HF forward pass | ✅ Shipped | asyncio.to_thread wraps both complete() and the buffered stream() fallback. |
| 3 | Device mapping + quantization knobs | ✅ Shipped | APOLLO_LLM_LOCAL_DEVICE_MAP, APOLLO_LLM_LOCAL_TORCH_DTYPE, APOLLO_LLM_LOCAL_LOAD_IN_8BIT, APOLLO_LLM_LOCAL_LOAD_IN_4BIT. 4-bit wins if both quantization flags are true. |
| 4 | Pre-pull orchestration + readiness gate | ⏸ Deferred | Block worker start until the ~40GB checkpoint is resident and a warm-up forward pass succeeded. Affects oracle's lifespan. |
| 5 | Streaming tokens through LLMClient |
⏸ Deferred | Required for admin-chat UX. Affects every provider's contract — should be a separate cross-provider change. |
Trigger for knobs 4 + 5. An operator commits to an on-prem MiniMax deployment (#4) or admin-chat surfaces a UX need for streaming (#5).
Knob 4 crosses from the provider into oracle's lifespan (gate the ready signal, distinguish "process up" from "model loaded", Kubernetes readinessProbe coordination); the ready-threshold semantics are unspecified without a real consumer. Knob 5 affects every provider's contract — OpenAI/Anthropic stream natively, minimax-local has a buffered single-chunk fallback; a unified streaming abstraction is a cross-provider design exercise. Both punted until there's a real consumer to design against.
4 · Configuration + required-mode flips
4.1 — Flip APOLLO_REQUIRE_INTENT_SCHEMA=true
Status: ⏸ Awaiting production coverage signal — enforcement code now in place. Phase-1 audit found the env var was declared but no handler read it; fixed 2026-05-12:
- New intent_schema: dict | None = None field on ChatRequest.
- /chat handler checks the flag at ingress and returns 400 with a referencing error detail when true and the field is missing.
- New apollo.hooks.chat.emit_intent_schema(...) helper. When the request carries an intent_schema block, oracle emits the observation before emit_user_prompt so both share the same trace id.
- Tests pin all four cases.
Remaining work to flip in production. Watch /stats → intent_schema_coverage ≥ 0.90 for a rolling 7-day window, then set the flag. One env-var change; no Apollo code change.
4.2 — Flip APOLLO_REQUIRE_TRACEPARENT=true
Status: ⏸ Awaiting emitter coverage signal — enforcement code already in place. Verified 2026-05-12: server/middleware/trace.py:71 reads the flag and returns 400 with a (missing|malformed) detail when required. Best-effort mode (default) mints a replacement and counts apollo_missing_traceparent_total / apollo_malformed_traceparent_total.
Remaining work to flip in production. Watch apollo_missing_traceparent_total rate near zero for a rolling 7-day window, then set the flag.
5 · Maintenance + retention
(No active items — snapshot tier-generation is tracked under §Implementation Plan post-M13 deferred work; the maintenance loop's purge path shipped in M13.)
7 · Observation + audit
7.3 — Persistent attributions for retroactive lineage (✅ Shipped)
Status. Shipped 2026-05-09. apollo_lineage_events Elastic index, apollo/lineage/{persist,queries}.py module, and /lineage extended to merge live (AttributionRegistry) and persisted sources. Entries tagged source: live | persisted | live+persisted. Retention bounded by APOLLO_LINEAGE_RETENTION_DAYS (default 90).
8 · Keycloak + auth
8.1 — Keycloak client-credentials grant for service-to-service auth
Status: ⏸ Blocked on platform-level Keycloak work (tracked in SPEC-PLATFORM-03 as pending). Apollo's side is already done — the existing emitted_by attribution path (shipped 2026-05-11) correctly handles service-principal tokens: when a request authenticates with a token whose roles includes "service", both caller_identity and emitted_by are stamped from the token without special casing. The /divergence audit endpoint already treats service-role emits as legitimate divergence rather than forging.
What happens when Keycloak's grant lands. An operator does the following with no Apollo code change:
1. Configure Keycloak to issue a client-credentials grant for an apollo-emitter service account.
2. Set APOLLO_SERVICE_TOKEN=<token> on background-worker environments.
3. Workers POST to /api/v1/apollo/observations with Authorization: Bearer $APOLLO_SERVICE_TOKEN.
4. Oracle's OAuthMiddleware validates the token through normal Keycloak introspection.
5. Apollo's ingest handler stamps emitted_by.token_subject = "apollo-emitter@service.axonis.ai" and emitted_by.token_roles = ["service"].
Auditors can then flag any divergence via GET /divergence.
Unblocks. Background observation ingest without user context. Scheduled synthesis jobs. Federation of artifacts.
9 · Test coverage
9.1 — Live deployment integration test (✅ Shipped)
Status. Shipped 2026-05-12 at oracle/tests/integration/test_live_scenario.py. The whole module is pytest.skip-ped unless APOLLO_LIVE_TEST=true. A staging CI job sets the flag plus the APOLLO_LIVE_* env vars to exercise the live path.
Covers: Oracle /health reachable; Apollo /stats returns status: "ok" with a real Bearer token; round-trip POST synthetic observation → assert it appears in /memories?trace_id=… → clean up; /chat returns the apollo_guidance field. Auth uses Keycloak client-credentials first, falls back to password grant (see §8.1).
10 · Spec + docs hygiene
10.1 — Design-journey screenshot / gif for presentations
Status: ⏸ Requires a recording session, not a code change. Operator runs the §Scenario Step 6 flow (admin chat "promote it") against a live cluster while screen-recording the SSE terminal showing the fan-out, then commits a ~30-second gif + a link from §Design Journey.
10.2 — Full-spec index in specs/ (✅ Shipped)
Status. Shipped 2026-05-12 at oracle/specs/README.md. One-paragraph-per-doc index keyed by audience, plus pointers to the adjacent docs under oracle/docs/.
11 · Memory module consolidation
11.1 — Absorb oracle's existing memory modules into Apollo
Status: ✅ Effectively resolved organically. When the original entry was written, three oracle-local modules overlapped with Apollo's surface. Status per module, audited 2026-05-12:
| Original module | Today |
|---|---|
oracle/server/memory/conversation.py |
Live, but a 12-line re-export shim — from axonis.memory.store import Store as ConversationStore. The shim survives because oracle's /chat and tests import ConversationStore from the oracle-local path. Replacing it with a direct axonis-core import is mechanical when the rename ripple settles. |
oracle/server/memory/cross_service.py |
Already deleted. No remaining callers. The strict-per-service MemoryService model (Invariant 17, shipped 2026-05-11) + Apollo's cross-service guidance channel cover what this used to do. |
oracle/server/models/memory.py |
Deleted 2026-05-12. The directory had become dead — server/models/__init__.py was importing a removed module, with zero consumers. Removed the whole server/models/ directory. |
Trigger for further work. None — what remains is the load-bearing ConversationStore shim, which is intentional (call-site stability across the axonis-core rename).
12 · Attach prioritization
Mostly historical. Items 12.1–12.8 shipped 2026-05-18 (the seven-layer prioritization rebuild). Only §12.9 (cap-defaults empirical study) is still in flight, and that's a data-collection task, not code. The golden-state contract for these layers is §Prioritization Layers.
12.1 — Capped-artifact observability (✅ Shipped)
Status. Shipped 2026-05-18. When the per-type attach cap holds an artifact back, a row lands in apollo_lineage_events with kind: "capped" + artifact_type + scope + trace_id. New queries: query_capped_for_artifact, aggregate_artifact_stats. New admin endpoints: GET /api/v1/apollo/lineage/capped, GET /api/v1/apollo/artifacts/{id}/stats. The pre-existing attached-only queries filter capped rows out by default.
12.2 — Sort key priority chain + per-type caps (✅ Shipped)
Status. Shipped 2026-05-18. apollo.guidance.attacher._sort_key consults evaluator_score → confidence → applicability specificity → weight → as_of. Mirror key in axonis.apollo.guidance_cache._priority_key. Per-type caps (APOLLO_ATTACH_CAP_*) keep the wire payload bounded; cap drops are deterministic (lowest-weight first) and counted by apollo_guidance_attach_capped_total.
12.3 — Ranking-signal contract pin (✅ Shipped)
Status. Shipped 2026-05-18. _content_from_proposal is documented + tested to preserve evaluator_score, confidence, and weight through promote. TestRankingSignalContract fails the build if any of the three signals is added to the strip list.
12.4 — Evaluator score writeback (✅ Shipped)
Status. Shipped 2026-05-18. apollo/evaluator/persist.py:persist_score_to_artifact writes the in-memory scoring engine's score + decomposition back to content.evaluator_score after every signal application. Fire-and-forget; never blocks the ingest hot path. Kill switch: APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED.
12.5 — Synthesis confidence emission (✅ Shipped)
Status. Shipped 2026-05-18. Every synthesis prompt schema requires confidence: 0.0..1.0; _SHARED_RULES explains the semantics. _normalize_confidence clamps + coerces in _record_proposal.
12.6 — Deepened rationale summary + per-artifact aggregation (✅ Shipped)
Status. Shipped 2026-05-18. _summarize now names attached + capped artifact IDs per type, truncating to top-5 with a +N tail. aggregate_artifact_stats exposes the same data per artifact.
12.7 — Promote-time similarity advisory (✅ Shipped)
Status. Shipped 2026-05-18. apollo/learner/similarity.py reuses axonis.memory.embedder (gated by the [memory] extra). Promote computes the new artifact's embedding, stores it on content.embedding_vector, and scans active artifacts at the same scope for matches above APOLLO_SIMILARITY_THRESHOLD (default 0.9). Returns hits in ActionResult.similar_artifacts — promote still succeeds.
12.8 — Curator-time similarity sweep (✅ Shipped)
Status. Shipped 2026-05-18. apollo/learner/coalescer.py is a fifth background loop. Each tick partitions active artifacts by (type, service, tool), union-finds clusters above APOLLO_COALESCER_THRESHOLD (default 0.85), and calls Apollo's LLM via build_coalesce_prompt to write a merger. Merger lands on apollo_proposals carrying supersedes: [...]. promote() extended to honor the list. Off by default (APOLLO_COALESCER_ENABLED=false).
12.9 — Cap-defaults empirical study (🚧 In flight — needs accumulated data)
Need. The per-type caps (§12.2) and similarity thresholds (§12.7, §12.8) ship with reasonable defaults but no production data behind them. Once §12.1's lineage rows + §12.4's score writebacks have run for a representative period (~weeks), revisit:
- Are the per-type caps biting on the right artifacts? apollo_guidance_attach_capped_total{scope, artifact_type} shows distribution.
- Should the cap shape be artifact-count or token-budget?
- Are similarity thresholds (0.9 promote-time / 0.85 sweep-time) tight enough to prevent over-coalescing, or loose enough to catch duplicates?
Trigger. Not code work — an empirical pass once telemetry is available. Output is a short addendum to §Prioritization Layers recommending any default changes.
Picking items off this list
Suggested triage when considering what to pick up: 1. If a related milestone is re-opening (e.g., a future M14b for beacon's L1 wiring or a milestone for parallax onboarding) — pick P1 items in that subsystem. 2. If production data shows pressure — promote the matching P2 to P1. 3. If an external prerequisite lands — unblock the P3 it gates (Keycloak client-credentials → 8.1; production MiniMax endpoint → 3.1).
No item is urgent. All are tracked here so they don't get rediscovered cold. Maintain by appending items as they're identified; remove items when they're addressed (link the commit in the commit message).
Depends on: component.beacon.workbench, component.cortex.intelligence, component.oracle.gateway, platform.axonis-core, platform.observability, platform.service-contract
Required by: component.beacon.ticketing, component.oracle.gateway