Apollo — System-Wide Observation, Learning, and Guidance Layer
Purpose
Apollo is the reasoning and memory layer over the platform's LLM activity. It sits inside oracle, observes every LLM and tool interaction across a three-layer system, distills durable artifacts from those observations, and exposes learned guidance back to the layers that need it.
Apollo is an observer, learner, and advisor. It does not execute workflows, does not call tools, does not retry failed requests, and does not interrupt live LLM calls. Iteration is driven by layer 1 (the front-end prompt/schema generator); Apollo's role is to make each successive iteration better informed than the last.
Apollo has its own LLM, its own memory, and an autonomous curator that maintains its own artifacts — but empowerment is strictly bounded to Apollo's internal state. Apollo cannot change auth, guardrails, token scopes, or user data.
Three-Layer Context
┌──────────────────────────────────────────────────────────────┐
│ Layer 1: Front-end │
│ - generates prompts + schemas for requests │
│ - consumes Apollo's guidance to shape future prompts │
│ - decides when to re-run a workflow │
└────────────────────┬─────────────────────────────────────────┘
│ intent, prompt, schema
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 2: Oracle / Apollo │
│ Oracle: auth, routing, LLM dispatch, tool aggregation │
│ Apollo: observe, learn, advise, curate │
└────────────────────┬─────────────────────────────────────────┘
│ tool calls, sub-LLM calls
▼
┌──────────────────────────────────────────────────────────────┐
│ Layer 3: Backend agents + libraries │
│ - agents (parallax, cortex, ...) — LLM-driven │
│ - libraries (UDS, athena, ...) — operational, no LLM │
│ - execute domain logic, return outputs │
│ - oracle observes the MCP round-trip and emits on their │
│ behalf (L3 never addresses Apollo directly) │
│ - agents consume injected guidance; libraries do not │
└──────────────────────────────────────────────────────────────┘
Apollo observes the full lineage of each request: intent (layer 1) → routing and LLM reasoning (layer 2) → execution and outcome (layer 3) → final response back to layer 1.
Communication with L1 and L3 is unidirectional and oracle-mediated. Oracle is Apollo's sole emitter: it extracts L1 signals from /chat request bodies and observes L3 outputs across the MCP round-trip, calling oracle.oracle.observer.ingest in-process on both layers' behalf — L1 and L3 never address Apollo directly (§Invariants 14, §Ingest Semantics). Guidance flows back the same way, attached to envelopes already travelling: onto /chat responses for L1, onto outbound MCP dispatches for agent-kind L3 services, all on the envelope's ambient auth with no service token or long-lived connection (§Injection Channel). Oracle is also a guidance subscriber for its own chat LLM (L2): the tool-executor at oracle/server/llm/tool_executor.py consults a process-local ApolloGuidanceCache each turn, no transport involved (§L2 path). Admins are the only exception — they use GET /api/v1/apollo/guidance for inspection and may POST to /api/v1/apollo/observations for replay/seed.
Apollo does not observe L1's or L3's internal LLM turns. llm_turn events are emitted by oracle (L2) only. Apollo learns about L1, L2, and L3 LLMs indirectly — from L1/L3 outputs (intent_schema / user_prompt for L1; tool_output / tool_error for L3), from oracle's own llm_turn and final_response (L2), and from outcome correlation — and improves them prospectively by injecting updated guidance into their prompt context.
Trace Propagation
Apollo relies on a single trace_id shared across every observation emitted for one end-to-end request. Trace propagation follows the W3C Trace Context standard (traceparent header) end-to-end across L1, L2, and L3.
This is the concrete realization of the OpenTelemetry aspiration noted in SPEC-PLATFORM-03 (Oracle) and introduces no conflict with axonis-core: neither axonis-core nor SPEC-PLATFORM-01/02/03 currently defines a trace, request-id, or correlation-id header. The only header propagation today is Authorization via axonis_core.gateway.client.extract_http_headers() (SPEC-01). Apollo's adoption is additive.
Header
traceparent: 00-<trace-id 32 hex>-<parent-span-id 16 hex>-<flags 2 hex>
Format: W3C Trace Context Level 1. Apollo uses only the trace-id segment for lineage stitching; parent-span-id and flags are preserved for standards compliance and future OpenTelemetry interop but are not interpreted by Apollo's lineage layer.
Who mints
- L1 mints the root
traceparenton every new request and sets it on the outbound HTTP call to oracle/chat(and equivalent endpoints). L1 does not call Apollo directly (§Invariants 14); oracle re-emits L1-origin observations in-process, reusing the sametrace-id. - If oracle receives a request without a
traceparentheader (e.g., a pre-W3C client), oracle mints one, logs amissing_traceparenttelemetry event, and surfaces the mintedtrace-idin the response so callers can correlate if they choose.
How it travels
| Hop | Carrier |
|---|---|
| L1 → L2 (HTTP to oracle) | traceparent request header |
| L2 → L3 (MCP tool dispatch) | traceparent HTTP header on the POST to the service's MCP endpoint (same transport as the existing Authorization forward) |
| L2 → L3 (HTTP fallback, non-MCP) | traceparent request header |
| L3 → L2 (MCP response → oracle) | traceparent is preserved by oracle's MCP client across the round-trip; oracle stamps the same trace_id on the tool_output / tool_error envelope it emits in-process |
Admin seed → Apollo (POST /observations) |
traceparent request header and trace_id field in the envelope |
| Out-of-process emitter → Apollo (secondary) | traceparent request header and trace_id field in the envelope (envelope is authoritative) |
Oracle is the only L2 hop and is responsible for forwarding the inbound traceparent unchanged on every downstream call that belongs to the same request. Oracle never re-mints mid-request.
axonis-core integration
Trace header propagation ships as an additive change to axonis-core — it lives with the existing cross-service header plumbing, not in oracle-only code:
axonis_core.gateway.client.extract_http_headers()— extended to forwardtraceparentalongsideAuthorization. This is the single source of truth for cross-service header propagation and is used by bothMCPClientandRestClient.axonis_core.gateway.mcp_client.MCPClient— readstraceparentfrom the inbound request context and sets it as an HTTP header on outbound MCP POSTs, alongside the existingAuthorizationforward.axonis_core.gateway.rest_client.RestClient— readstraceparentfrom the inbound context and sets it as an HTTP header on outbound REST calls.ApolloClient(SPEC-PLATFORM-14 §Ingest Semantics, in axonis-core) — used by admin replay and any future out-of-process emitter; reads the ambienttraceparentfrom request context and sets it as an HTTP header on everyPOST /api/v1/apollo/observationscall and into the envelope'strace_idfield (the envelope wins on conflict — see §Envelope mapping). Phase-1 emitters do not use this client; oracle emits in-process and carriestrace_idon the envelope it builds directly.
No new dependency is added to axonis-core — parsing the 4-segment traceparent string is a handful of lines; no OpenTelemetry SDK is required. A future OpenTelemetry integration can consume the same header without change.
Envelope mapping
Apollo's observation envelope fields map to W3C Trace Context as follows:
| Envelope field | W3C source | Purpose |
|---|---|---|
trace_id |
traceparent.trace-id (32-hex) |
Shared by all events for one end-to-end request |
parent_trace_id |
not derived from traceparent | Set by emitter only when this trace is a sub-request spawned from a separate enclosing trace (e.g., a scheduled background workflow). Null otherwise. |
parent_trace_id is not the same as W3C parent-span-id. Apollo does not track span hierarchy within a single trace — its per-event observation cadence (§Observation Model → Observation cadence) makes span-level granularity unnecessary. parent_trace_id is used only for cross-trace fork linkage.
Configuration
APOLLO_TRACE_HEADER— header name. Defaulttraceparent(W3C). Configurable only to ease staged rollout against pre-W3C emitters; alwaystraceparentin production.APOLLO_REQUIRE_TRACEPARENT— whentrue, oracle rejects inbound requests without a validtraceparent. Defaultfalsethrough Phases 1–2 (oracle mints on absence). Flip totruein Phase 3 alongsideAPOLLO_REQUIRE_INTENT_SCHEMAonce emitter coverage is proven.
Failure posture
- Missing header (best-effort): oracle mints, logs
missing_traceparent, serves the request. Lineage still stitches because the mintedtrace-idflows downstream and is used by oracle's own observations. - Missing header (required mode): oracle rejects with 400; emitter must include
traceparent. - Malformed header: oracle rejects with 400 in required mode; logs
malformed_traceparentand mints a replacement in best-effort mode. - Envelope
trace_iddiffers from header: the envelope value wins — it is the emitter's authoritative signal. Oracle logs the discrepancy for diagnostics.
Package Structure
Apollo lives inside oracle as a set of subsystem packages (after the apollo/ → oracle/ flatten), mounted into oracle's Starlette app at /api/v1/apollo/*. It is not a separate service and has no __main__.py of its own — the oracle invariant ("oracle is the only externally exposed service", SPEC-03 §Invariants 1) is preserved. Subsystems, all under oracle/ in the oracle repo:
oracle/observer/— observation intake:ingest.py(normalize → memory write → evaluator fan-out),events.py(Pydantic event models).oracle/memory/—store.py: axonis-coreUDSstores for everyapollo_*index.oracle/learner/—synthesis.py(event-driven LLM synthesis),graphs.py(Decision Graphs),extractors.py,snapshots.py,drift.py(graph-anchor check),prompts.py,coalescer.py,similarity.py.oracle/guidance/—api.py(admin inspection endpoints),attacher.py(in-process attach to/chat+ MCP dispatch),selectors.py(intent → artifact matching).oracle/curator/—actions.py,policy.py,audit.py,auto.py(autonomous Curator).oracle/evaluator/—scoring.py,signals.py,cascade.py,attribution.py,persist.py,recommendations.py.oracle/lineage/— cross-trace attribution persistence (apollo_lineage_events).oracle/chat/— admin-only conversational interface + its tools.oracle/artifacts.py,oracle/schema.py,oracle/llm.py— typed artifact schemas,Schema/INDICES, Apollo's own LLM client.
Routes mount from server/__main__.py.
Observation Model
Event types
Apollo recognizes the following event types, emitted by oracle and backend services:
| Event type | Emitter | Purpose |
|---|---|---|
intent_schema |
Oracle (from L1 /chat request body) |
Front-end's generator schema for this request |
user_prompt |
Oracle (from L1 /chat request body) |
Concrete prompt produced from the intent schema |
llm_turn |
Oracle (layer 2) | One LLM request/response cycle inside oracle |
tool_output |
Oracle (from L3 MCP response) | Successful tool execution: inputs, outputs, latency |
tool_error |
Oracle (from L3 MCP response) | Tool failure: inputs, error message, stack trace, latency |
final_response |
Oracle | What was returned to layer 1 at the end of a conversation turn |
user_feedback |
Oracle (from L1 feedback submission) | Thumbs up/down, correction, explicit follow-up signal |
Emission paths are covered in detail in §Ingest Semantics. In summary: every Phase-1 event (L1-origin, L2-origin, and L3-origin) is emitted by oracle in-process — neither L1 nor L3 addresses Apollo directly (§Invariants 14). POST /api/v1/apollo/observations remains mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.
Observation payload
All observations share a common envelope — ObservationEnvelope (oracle/observer/events.py), the only shape Apollo accepts on either ingest path. Top-level fields: event_type (the EventType enum), lineage (trace_id, parent_trace_id, conversation_id), service, timestamp, the two attribution axes caller_identity and emitted_by, and an event-specific payload (one of the typed _Payload subclasses in the same module).
caller_identity vs emitted_by. Apollo records two attribution axes per observation:
caller_identity— application-asserted. Who the work is attributed to. Set by the emitter (often a service token stamping observations on behalf of an end user — e.g., cortex emitscaller_identity.username="alice"because alice's/chatrequest fanned out to cortex). The handler stamps this from the Bearer token only when the envelope didn't carry one.emitted_by— server-stamped, unforgeable. Who actually pushed the bytes. Always overwritten by Apollo's ingest handler (HTTP path) or in-process emit helper (oracle hosts Apollo). Carries the validated tokensubject,roles, and acontext("http" or "in_process"). Emitters cannot forge it; the handler ignores any inbound value and stamps fromrequest.state.token_payload.
Audit query: rows where caller_identity.username != emitted_by.token_subject and emitted_by.token_subject is not a known service principal → flag for review. The two-axis model preserves the legitimate cross-attribution pattern (services emitting per-user observations) while making forging detectable.
Observation cadence (locked)
Apollo records one observation per: - Turn boundary (each LLM request/response cycle) - Tool invocation (tool_output or tool_error) - Error - Final response returned to layer 1
Apollo does not record per-token events. Token-level observation is too noisy for the learner and would cause drift in learned artifacts. This is a drift-prevention decision.
Lineage
Every observation carries a trace_id derived from the W3C traceparent header propagated across L1 → L2 → L3 (§Trace Propagation). Related observations (all events from one end-to-end request) share the same trace_id. Cross-trace sub-requests (e.g., scheduled background workflows spawned from a chat turn) use parent_trace_id for hierarchy.
Memory Model
Apollo's memory is two-tiered:
- Raw observations — the events listed above, stored in the Elastic
apollo_observationsindex. High volume, time-boxed retention. - Learned artifacts — structured, versioned objects produced by the Learner. Stored in the Elastic
apollo_artifactsindex. Low volume, long-lived.
Both indices use the Memory(UDS) class from axonis_core.userspace.intelligence as their UDS primitive (SPEC-01 §Memory Pattern), specialized via subclassing. Apollo does not re-implement the storage surface.
Artifact types
| Artifact | Description |
|---|---|
DecisionGraph |
A specialized graph of decision points and transitions (see §Decision Graphs) |
DecisionTrajectory |
Smoothed trajectory of a graph's evolution over time |
DriftEvent |
Flagged structural shift in a decision graph requiring review or explanation |
IntentPattern |
Recurring front-end intent → successful tool/service routing and output shape |
IntentSchema |
Known layer 1 generator schemas Apollo has learned to recognize |
SchemaDrift |
Layer 1 started emitting a new or changed schema — flagged for admin review |
PromptShape |
Recurring prompt structure correlated with good/bad outcomes |
ToolPairingHint |
"Tool X is usually followed by tool Y in successful runs" |
FailurePattern |
Known failure mode with diagnostic signature and recommended remediation |
ServiceConnectionHint |
"For intent of class Q, service S gives better results than service S'" |
SpecFragment |
Short, targeted spec snippet relevant to a class of intent |
PromptShim |
System-prompt addition that improves outcomes for a class of intent |
CapabilityMap |
Distilled view of which services can satisfy which intents |
Each artifact is a Pydantic model in apollo/artifacts.py backed by a UDS class. Artifacts are versioned — see §Curator.
Index mappings and templates
Every Apollo index is a flat Elasticsearch index (not a data stream, no ILM policy). Mappings are shipped as JSON templates under oracle/oracle/templates/, following the same convention as athena/core/templates/*_mapping.json:
apollo_observations_mapping.jsonapollo_artifacts_mapping.jsonapollo_artifact_history_mapping.jsonapollo_graph_nodes_mapping.jsonapollo_graph_edges_mapping.jsonapollo_graph_snapshots_mapping.jsonapollo_audit_mapping.json
Every mapping includes the standard UDS block (uds.timestamp, uds.username, uds.visibility), create_ts, update_ts, schema_version, and — for time-limited indices — an expires_ts date field (same pattern as athena/core/templates/memory_mapping.json). Every index follows the Memory(UDS) / Elastic base-class pattern from SPEC-01 so that CRUD goes through axonis_core.elastic.Elastic.
Retention
Retention is application-managed, not Elastic-ILM-managed. This matches the codebase convention: axonis-core and athena do not configure ILM policies, rollovers, or data streams. Each Apollo document that has a bounded lifetime carries an expires_ts field; a periodic maintenance task (see below) runs Elastic.delete_by_query filtering on expires_ts < now() to reclaim space.
| Class | Index | Expiry mechanism | Retention |
|---|---|---|---|
| Raw observations | apollo_observations |
expires_ts = create_ts + 30d set on write |
30 days |
| Graph snapshots (hot) | apollo_graph_snapshots |
expires_ts set by coarsening task (see below) |
Hourly granularity for 7 days |
| Graph snapshots (warm) | apollo_graph_snapshots |
Daily snapshots retained after coarsening | Daily granularity for 30 days |
| Graph snapshots (cold) | apollo_graph_snapshots |
Weekly snapshots retained after coarsening | Weekly granularity for 90 days total |
| Learned artifacts | apollo_artifacts |
No expires_ts — lifecycle driven by Curator |
Indefinite; forgotten by admin or Evaluator-demoted N cycles |
| Artifact history | apollo_artifact_history |
No expires_ts |
Indefinite (rollback substrate) |
| Audit log | apollo_audit |
expires_ts = create_ts + 90d or null for indefinite |
≥ 90 days (configurable) |
Maintenance task. A periodic background job (default hourly, configurable via APOLLO_MAINTENANCE_INTERVAL) performs:
1. delete_by_query on any index where expires_ts < now()
2. Coarsening on apollo_graph_snapshots: hourly rows older than APOLLO_SNAPSHOT_HOURLY_TO_DAILY_AGE_DAYS (default 7) are grouped by (graph_id, calendar date); the most recent row in each group is re-tagged tier="daily" and the rest deleted. Same shape at the daily→weekly boundary: dailies older than APOLLO_SNAPSHOT_DAILY_TO_WEEKLY_AGE_DAYS (default 30) collapse to one weekly row per (graph_id, ISO week). Both windows are env-overridable; see apollo/settings.py for the documented operator profiles, validation rules, and storage trade-offs.
3. Optional Learner-driven compaction of observations near TTL into apollo_artifacts summaries (event-driven: compaction runs on admin-initiated synthesis or guidance miss, not in this maintenance pass).
The maintenance task uses axonis_core.elastic.Elastic.delete_by_query — no Apollo-specific Elastic client.
Learner
Apollo's Learner is LLM-driven, graph-anchored. Apollo's LLM (see §Apollo's LLM) is the primary engine of synthesis: it processes observations as they arrive (event-driven — see §LLM synthesis below), creates and refines artifacts, classifies intents, diagnoses outcomes, and drives admin chat. The decision graphs are supplemental — they provide deterministic grounding that keeps the LLM anchored and prevents it from drifting.
The relationship is: the LLM reasons flexibly; the graphs remember rigidly. Every LLM call reads the relevant graph state as grounding context. Every LLM output is checked against the graph's trajectory. The LLM cannot propose a pattern that contradicts what the graphs have deterministically recorded without being flagged as drift.
Decision Graphs
Apollo maintains a series of specialized graphs rather than one monolithic graph. Each graph captures a different decision surface:
| Graph | Nodes | Edges |
|---|---|---|
intent_tool_graph |
Intent classes, tool identifiers | "Intent → tool chosen" with outcome weight |
prompt_shape_graph |
Prompt structure clusters | "Shape A evolved into shape B in later iteration" |
service_routing_graph |
Intent classes, backend services | "Intent → service picked" with outcome weight |
outcome_graph |
Decision points, outcome classes | "Decision → outcome produced" with frequency |
iteration_graph |
States within a layer-1 re-run chain | "Iteration N → Iteration N+1 decision delta" |
Cross-graph links exist where decisions in one graph point to nodes in another (e.g., a tool-selection node in intent_tool_graph links to the outcome node in outcome_graph).
Node and edge model
Each node carries:
- id, graph_id, kind, label
- Occurrence count, first-seen / last-seen timestamps
- Outcome distribution (aggregated from incoming observations)
- Tags for retrieval
Service-namespaced labels
Every label that is naturally service-scoped is prefixed with the
emitting service: <envelope.service>/<label>. Concretely, the
extractors namespace labels for the following node kinds:
| Graph | Kind | Example label |
|---|---|---|
intent_tool_graph |
intent, tool |
cortex/screening, cortex/summarize |
prompt_shape_graph |
prompt_shape |
oracle/shape:20:a3f1b2c0 |
service_routing_graph |
intent |
parallax/screening |
outcome_graph |
decision_point |
cortex/tool:summarize, oracle/conversation:conv_42 |
iteration_graph |
iteration_state |
oracle/iter:trc_1234 |
Two node kinds are intentionally not prefixed:
servicenodes inservice_routing_graphcarry the service name itself as their identity (e.g., barecortex). Prefixing would yield the meaningless labelcortex/cortex.outcomenodes inoutcome_graphcarry universal categorical labels (success,error,feedback_up,feedback_down,feedback_abandoned). The per-service split is carried by thedecision_pointside of the edge, not by fragmenting the outcome taxonomy.
This rule means two backend services that register a tool with the same
name (e.g. cortex/summarize and parallax/summarize) form distinct
nodes and accumulate counts, EWMA weights, and outcome distributions
independently. Downstream synthesis (M8), drift detection (M12), and
evaluator scoring (M10) therefore operate on per-service signal rather
than a cross-service average.
Each edge carries:
- source_id, target_id
- Weight (an outcome-correlation-adjusted transition probability)
- Count, first-seen / last-seen
- Recent-window weight (exponentially-weighted moving average over a short horizon)
- Long-window weight (EWMA over a long horizon)
The divergence between recent-window and long-window weights is the primary drift signal.
Graph updates (per observation, deterministic)
The Learner's extractors run deterministically on every ingested observation:
- Extract decision points (e.g., "intent class", "tool called", "service routed", "outcome class") using rules and lightweight matchers.
- Upsert nodes: create new if absent, increment count and update last-seen if present.
- Upsert edges: create new or reinforce. Update short-window and long-window weights.
- Attach the observation's
trace_idto the affected nodes/edges for lineage queries.
No LLM call. No new free-form artifacts. Graph mutations only. This path is the grounding layer — it records what has actually happened in the system, with no interpretation.
Snapshots and trajectory
- Snapshots. Each graph is snapshotted on a cadence (default: hourly; configurable via
APOLLO_GRAPH_SNAPSHOT_INTERVAL) into the Elasticapollo_graph_snapshotsindex. Snapshots are the substrate for past-vs-current comparison. - Trajectory. A projection of near-future graph state from current EWMA velocities. Used by Guidance to pre-warm likely-next decisions and by drift detection to establish an expected trajectory.
LLM synthesis (event-driven, primary driver)
Apollo's LLM runs the primary synthesis engine and is event-driven, not scheduled. It is invoked in response to specific observation events — not on a timer, not on a batch threshold. The cadence of synthesis matches the cadence of actual system activity.
Synthesis triggers.
| Trigger | Inbound event |
|---|---|
| Layer 1 sends a request | intent_schema or user_prompt observation ingested |
| Layer 3 returns an output | tool_output, tool_error, or final_response ingested |
| Admin chat turn | POST /api/v1/apollo/chat request |
| Admin-initiated synthesis | POST /api/v1/apollo/learn request |
Other observation types (llm_turn from oracle itself) feed the graphs but do not trigger synthesis on their own — they are intermediate steps between a Layer 1 request and a Layer 3 output. Novel-intent synthesis occurs naturally on the Layer 1 / Layer 3 triggers above; no GET /guidance request from L1 or L3 ever drives synthesis (§Invariants 14).
Inputs on each synthesis call.
- The triggering observation (or chat turn)
- The relevant subgraph state from each decision graph (grounding context)
- Active artifacts that match the observation's intent/tool/service fingerprints
- Recent evaluator scores for matched artifacts
- Prior synthesis output for the same trace_id, if any (for continuity within a request lineage)
Outputs.
- Proposed new artifacts (IntentPattern, FailurePattern, etc.)
- Proposed edits to existing artifacts
- Proposed promotions/demotions
- Drift flags when the LLM itself detects divergence
- Compaction proposals for old observations near TTL
- For admin chat and admin-initiated triggers: a direct response returned to the caller
All outputs are structured Pydantic models. The Curator commits them only after the graph-anchor drift check (below) clears.
Concurrency. A burst of Layer 3 tool outputs (e.g., a fusion run with many tool calls) can trigger many near-simultaneous synthesis calls. Apollo bounds concurrent synthesis via APOLLO_SYNTHESIS_MAX_CONCURRENT (default 4) with a queue of pending triggers. Duplicate triggers within the same trace_id are coalesced: only the latest observation in a lineage is processed.
Graph-anchor drift check
The graphs are the anti-drift mechanism. Every LLM synthesis output is validated against the graphs before the Curator commits it:
- Proposed pattern vs. recorded edges. If the LLM proposes "tool X is typically followed by tool Y" but the
intent_tool_graphedge X→Y has low weight or is absent, the proposal is flagged. - Proposed intent classification vs. node clusters. If the LLM introduces an intent class that does not correspond to any node cluster in the graphs, flagged.
- Weight swings. If the LLM's proposal would effectively invert a strongly-weighted edge, flagged — even if the LLM's reasoning is plausible, this is exactly the shape of drift.
- Trajectory coherence. If the LLM's proposed trajectory diverges from the graph's EWMA projection, flagged.
Flagged outputs produce a DriftEvent artifact. The Curator does not commit a flagged proposal autonomously — admin review is required via chat or the audit surface. This is how the graphs protect the LLM from itself.
Drift vs. evolution
The graph-anchor check distinguishes:
- Evolution — LLM synthesis outputs consistent with graph trajectory; graph weights shift smoothly as observations accumulate. Proposals are committed autonomously by the Curator.
- Drift — LLM synthesis outputs diverge from graph state; sudden edge-weight swings; emergent nodes appearing faster than configured rate caps. Proposals are held for admin review.
Thresholds are per-graph and configurable (z-score on weight deltas, rate-of-new-nodes caps, divergence tolerance on LLM outputs).
Storage
- Graph nodes and edges live in the Elastic
apollo_graph_nodesandapollo_graph_edgesindices (UDS-backed, per SPEC-01 invariant 2). - A working in-memory mirror of the active graphs is maintained for hot-path reads (guidance, drift detection). The in-memory mirror is derived state; it is always rebuildable from Elastic.
- Snapshots live in
apollo_graph_snapshots. Snapshots are immutable after write.
Guidance API (admin inspection only)
Apollo delivers guidance to L1 and L3 LLMs exclusively via the response-attached Injection Channel (§Injection Channel); those layers never pull it — there are no GET calls from L1/L3 in the runtime path. The GET /guidance* endpoints below are retained only as admin inspection tools — admins and admin-chat tooling preview what Apollo would currently inject for a given intent, layer, or subscriber. They require role admin via oracle's guardrails (SPEC-03 §Guardrails). L3 operational libraries (no LLM) receive no guidance — they emit observations and are otherwise opaque to Apollo.
Endpoints
GET /api/v1/apollo/guidance?intent=<query>&layer=1|3
GET /api/v1/apollo/guidance/schemas
GET /api/v1/apollo/guidance/tools
GET /api/v1/apollo/guidance/specs
GET /api/v1/apollo/guidance/connections
The top-level /guidance endpoint accepts an intent description (free text or structured) and the consuming layer, and returns a ranked set of applicable artifacts — previewing what Apollo would currently inject. The sub-paths return filtered artifact views by type.
All endpoints require the admin role. L1 and L3 never call them (§Invariants 14).
Example response
{
"intent_match": {"pattern_id": "ipat_abc", "score": 0.88},
"schemas": [...],
"tools": [
{"name": "fusion_run_start", "description_override": "...", "routing_hint": "parallax"}
],
"specs": [
{"id": "spec_frag_123", "content": "For federate alignment, ensure lens binding..."}
],
"connections": [
{"from": "layer1.screening_intent", "to": "parallax.fusion", "confidence": 0.91}
]
}
Injection Channel
Apollo delivers guidance to L1 and L3 LLMs by attaching it to the existing request/response flow — symmetric piggybacking in both directions. There is no separate push transport, no long-lived connection, no service token, no SSE client in production. Guidance is computed at request time (in-process inside oracle, since Apollo lives there) and embedded in the envelope that was already travelling.
- L1 path: oracle attaches current applicable guidance to every
/chatresponse body. - L3 path: oracle attaches current applicable guidance to every outbound MCP tool dispatch.
Both paths are fresh-per-call by construction — there is no cache to go stale, no reconnect to replay, no disconnected subscriber to reconcile. Apollo lives inside oracle, so fetching guidance for an outbound envelope is an in-process Python call, not a network hop.
Guidance communication is unidirectional: Apollo → L1, Apollo → L3. Subscribers never POST guidance back (observation ingest is a separate path — §Ingest Semantics). Captured as §Invariants 14.
Why response-attached instead of a push channel
L1 is only doing LLM work when composing a response to the user's latest message — the act of calling /chat is what triggers that work. Any guidance change Apollo makes while L1 is idle has nothing to apply to until the next /chat, at which point the response can carry the freshest state. A separate push channel for idle L1 therefore provides no observable benefit and introduces a long-lived auth session to maintain.
L3 agents only exist inside a user-request context (oracle dispatches to them; they validate the forwarded user token). There is no service-token mechanism in axonis-core today (see §Authentication & Authorization). A long-lived L3 connection would require inventing one. Attaching guidance to the MCP dispatch uses the existing user-token-forwarding pattern and delivers guidance exactly when the agent needs it.
L1 path: attached to /chat responses
POST /api/v1/chat is oracle's user-facing chat surface, driven by oracle's own LLM tool-use loop (oracle/server/llm/tool_executor.py, 5 providers: anthropic / openai / groq / ollama / trinity). It is distinct from Apollo's admin chat at POST /api/v1/apollo/chat, which runs Apollo's separate MiniMax LLM for talking to Apollo's synthesis brain.
When oracle responds to a POST /chat, it calls Apollo's in-process oracle.guidance.for_l1(user=..., intent_context=...) before serializing, and embeds the result on the response envelope under apollo_guidance. Beacon-style L1 clients consume that field via their local ApolloGuidanceCache.update(...).
Model extension. Oracle's existing ChatResponse Pydantic model (oracle/server/api/routes.py) must be extended with an optional field:
class ChatResponse(BaseModel):
response: str
conversation_id: str
tool_calls: list = Field(default_factory=list)
model_used: str = ""
tokens: dict = Field(default_factory=dict)
apollo_guidance: dict | None = Field(default=None) # added by SPEC-14
The field defaults to None, so pre-Apollo clients and responses where guidance is omitted (attach-timeout, Apollo unavailable, empty applicable set) serialize identically to today. Clients that don't know about the field simply ignore it.
Envelope shape when guidance is present:
{
"response": "...assistant reply...",
"conversation_id": "...",
"tool_calls": [...],
"model_used": "...",
"tokens": {...},
"apollo_guidance": {
"as_of": "2026-04-17T10:30:00Z",
"artifacts": [
{
"id": "pshim_xyz",
"type": "PromptShim",
"version": 7,
"content": { ... },
"applicability": { "intent_class": "...", "tags": [...] },
"rationale": "Human-readable explanation of why this artifact is active now."
}
],
"rationale_summary": "3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
}
}
L1 receives the response, hands apollo_guidance.artifacts to its local ApolloGuidanceCache, and renders the assistant message. The payload is the complete applicable set for this user's L1 scope — not a diff — so cache replacement is strictly idempotent.
On the next user turn, L1 uses the freshly-populated cache to compose its prompt. Guidance staleness is bounded by a single turn.
L2 path: in-process cache for oracle's own chat LLM
Oracle's chat surface at POST /api/v1/chat runs its own LLM tool-use loop in oracle/server/llm/tool_executor.py (5-provider gateway: anthropic / openai / groq / ollama / trinity). Oracle is therefore also a guidance subscriber for its own LLM — distinct from L1 (beacon's LLM) and L3 (cortex/parallax's LLMs).
Because oracle hosts Apollo, no transport is needed. Oracle owns a process-local ApolloGuidanceCache populated directly from oracle.guidance.for_l2(...) (analogous to for_l1 and for_l3_agent) before each LLM turn. The tool-executor consults the cache via the canonical accessors (get_system_prompt_additions, get_spec_fragments, get_active_failure_patterns, get_tool_pairing_hints, get_tool_description_overrides, get_service_connection_hints) on every turn and folds the results into its system prompt and tool-catalog rendering, exactly as L1 and L3 subscribers do.
The L2 path is symmetric with L1/L3 in artifact applicability filtering (scope=l2 on the attacher), in the timeout budget (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS), and in the failure posture (cache miss / timeout → tool-executor proceeds with no guidance, request still succeeds). It differs in transport only: no JSON serialisation, no envelope traversal — a direct in-process call.
L3 path: attached to MCP tool dispatches
When oracle dispatches a tool call to an L3 agent (component_kind == "agent"), oracle attaches Apollo's currently-applicable guidance inside the tool's arguments dict under the apollo_guidance key — mirroring the existing pattern oracle uses to inject llm_spec into arguments (oracle/server/mcp/server.py). This keeps the JSON-RPC envelope shape unchanged (params stays {name, arguments}) and requires no MCP handler changes on agent-side beyond the agent extracting and applying the new argument:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "fusion_run_start",
"arguments": {
"...tool-specific args...": "...",
"apollo_guidance": {
"as_of": "2026-04-17T10:30:00Z",
"artifacts": [ ... ],
"rationale_summary": "..."
}
}
}
}
L3 agent-side MCP handlers extract apollo_guidance from arguments (the same way they currently extract llm_spec), hand it to their local ApolloGuidanceCache for the duration of this request's LLM turns, and strip it before passing the remaining arguments to the tool's business logic. Because L3 only acts inside a user-request context, cache lifetime naturally scopes to the request — no background state, no long-lived connection, no service-token novelty.
L3 libraries (component_kind == "library") do not receive apollo_guidance in their dispatches — oracle filters them out before serialization. Libraries have no LLM to improve (§Invariants 15).
Payload shape
apollo_guidance carries:
as_of— timestamp of the artifact snapshot. Used for traceability and admin debugging.artifacts— the currently-applicable artifact set for the subscriber's scope. Each artifact hasid,type,version,content,applicability, andrationale(see §Rationale and evidence).rationale_summary— structured one-liner naming the attached artifact IDs per type, plus a+N capped (...)tail for artifacts the per-type cap held back. See §Prioritization Layers → Layer 5 for the exact format and the parallelaggregate_artifact_statsquery for per-artifact stats.
There is no injection_id, no reason/trigger enum, no subscriber_scope, no evidence_ref on the per-call payload. That metadata lives in the audit log (§Audit log) — attaching it to every response/dispatch would balloon payload size with data that matters to admins, not to LLMs.
Freshness and ordering
Guidance is always at most one turn stale from each subscriber's perspective:
- L1's next
/chatcall sees the freshest guidance. Between turns, L1's cache reflects the guidance as-of the most recent response. - L3's MCP dispatch carries guidance computed at the instant oracle is about to call. By construction the agent sees guidance current at dispatch time.
Because the cache is overwritten on every inbound response/dispatch, there is no "subscriber drift" problem to solve — the cache cannot diverge from Apollo.
Triggers (synthesis unchanged)
Apollo's Curator still commits artifact mutations event-driven (§Learner → LLM synthesis). The commits no longer trigger separate push events — they simply become the state that the next attached apollo_guidance payload reflects. Pause/resume of the Curator is therefore also a passive effect: paused Curator → artifact set stops changing → subscribers keep receiving the same state on subsequent calls.
Failure posture
- Apollo slow: oracle's guidance-fetch call has a strict in-process budget (
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS, default 10 ms). On overshoot, oracle serializes the response/dispatch withoutapollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. No user-visible failure. - Apollo has no applicable guidance:
apollo_guidanceis omitted (ornull). Subscribers proceed without guidance. Normal state during Phase 1. - Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation, demotes; the next attached payload reflects the demotion. Admin can force rollback at any time.
- No network partition risk: Apollo is in-process with oracle. There is no network path between them that can fail.
- Curator paused: attached payloads continue to reflect the state as-of the pause. Subscribers see frozen guidance until resume. Because every response/dispatch still carries the current set, subscribers never lose their guidance due to the pause — it just stops changing.
Rationale and evidence
Each artifact in the attached payload carries a rationale string (LLM-synthesized for LLM-driven proposals; templated from score decomposition for deterministic Evaluator actions). This is the same rationale written into apollo_audit (§Audit log). Subscribers may log it when applying the artifact to a prompt; admins query it via audit log or admin chat (§Admin Chat).
The fuller evidence_ref (pointers to observations, graph snapshot id, score decomposition, related drift events) is not carried in the per-call payload — it lives in apollo_audit. Admins retrieve it via explain_decision / discuss_decision in admin chat, which resolves the audit record.
Audit
Every Curator action writes an apollo_audit record with action, actor, trigger, rationale, and evidence_ref (§Audit log). Individual deliveries — attached payloads on responses and dispatches — are not audited. Delivery would produce one record per user turn per layer, far too noisy to be useful. The audit captures decisions; deliveries are implementation detail.
Subscriber SDK: ApolloGuidanceCache (pure local cache)
ApolloGuidanceCache in axonis-core is a pure in-memory cache with no transport. It has two surfaces:
Update (called by the subscriber's request handler):
cache.update(apollo_guidance_block)— replaces the cache's artifact set with the payload. Idempotent; the payload is the complete applicable set, not a diff.
Canonical accessors (consumed by the subscriber's LLM-turn codepath):
| Method | Returns | Used at |
|---|---|---|
get_system_prompt_additions(intent_context) |
Ordered list of PromptShim bodies |
System-prompt construction |
get_spec_fragments(intent_context) |
List of SpecFragment |
RAG-like context insertion |
get_tool_description_overrides(tool_name) |
Override dict or None |
Tool-catalog rendering |
get_tool_pairing_hints(current_tool) |
List of ToolPairingHint |
After-tool-call reasoning |
get_active_failure_patterns(intent_context) |
List of FailurePattern with diagnostic hints |
Pre-call guard; post-call error interpretation |
get_service_connection_hints(intent_context) |
List of ServiceConnectionHint |
Service routing |
Applicability filtering happens inside the cache: each artifact's applicability block is matched against the caller's intent_context. When multiple artifacts of the same type match, the SDK returns them ordered by (weight desc, recency desc); merge policy past ordering is the agent's choice.
No HTTP client, no long-lived connection, no authentication inside the SDK — the cache is a data structure inside the subscriber's process. SPEC-01 invariant 1 (axonis-core has no ML dependencies) is preserved; ApolloGuidanceCache is pure Python data structures.
Empty-cache fallback: if no apollo_guidance has yet been delivered to this subscriber (first call, Apollo off, APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS overshoot), all accessors return empty lists / None. The subscriber proceeds without guidance. This is the safe default pre-Apollo behavior.
Admin inspection
Admins can preview what Apollo would attach on the next request:
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/apollo/guidance?scope=l1 |
Preview current L1-scoped artifact set |
| GET | /api/v1/apollo/guidance?scope=l3:<service_name> |
Preview current L3-scoped artifact set for a given agent |
| GET | /api/v1/apollo/guidance/stream?scope=<scope> |
Admin-only SSE feed of Curator commits in real time (debugging) |
The SSE feed is a debugging aid only — production delivery never uses it. All admin inspection endpoints require the admin role.
Prioritization Layers
The attacher must not only find applicable artifacts but choose which subset reaches the receiver's LLM: a naive "match everything, send everything" bloats the receiver's prompt budget as the artifact set grows and makes operator-promoted artifacts indistinguishable from low-value ones. Prioritization is implemented as seven cooperating layers that make selection observable, quality-aware, and bounded. They are ordered by data dependency — earlier layers don't depend on later ones, and each stays useful even when the layers above it are disabled.
Layer 1 — Capped-artifact observability
When the per-type attach cap drops an artifact (see Layer 2 for the cap mechanism itself), each held-back artifact gets a row in apollo_lineage_events with kind: "capped", the artifact's artifact_type, the call's scope and trace_id. Two query paths read these rows:
query_capped_for_artifact(artifact_id, *, service_name=None, limit=500)— list traces where this artifact was capped.aggregate_artifact_stats(artifact_id, *, since=None, limit=1000)—{attached_count, capped_count, last_attached_at, last_capped_at}.
Both surfaces are exposed on the admin REST API as GET /lineage/capped and GET /artifacts/{artifact_id}/stats. The same lineage rows are also available to the evaluator for "matched-but-shadowed" diagnostics.
Invariant. query_traces_with_artifact and query_trace_attribution filter kind: "capped" out by default — the "applied" semantics of /lineage is unchanged.
Layer 2 — Selection sort key
oracle.guidance.attacher._sort_key orders matched artifacts before the cap fires. Each tier has a default that preserves the previous tier's behavior, so the chain stays well-defined even with sparse data:
| Tier | Source | Default when missing |
|---|---|---|
| 1 | content.evaluator_score |
1.0 (innocent until signaled) |
| 2 | content.confidence |
0.0 (no opinion stated) |
| 3 | applicability specificity (count of populated narrowing fields) | 0 |
| 4 | content.weight |
1.0 |
| 5 | as_of |
"" |
evaluator_score defaults to 1.0 to match ArtifactScore.score's baseline (a never-signaled artifact is treated as innocent). confidence defaults to 0.0 because synthesis confidence is an opt-in endorsement — absence means "no opinion." Specificity activates today and is the practical lever when the upper tiers tie; tiers 1 and 2 become load-bearing once their sources flow (see Layers 4-A and 4-B).
Per-type caps live in config:
APOLLO_ATTACH_CAP_PROMPT_SHIM=10
APOLLO_ATTACH_CAP_SPEC_FRAGMENT=5
APOLLO_ATTACH_CAP_TOOL_PAIRING_HINT=5
APOLLO_ATTACH_CAP_FAILURE_PATTERN=10
APOLLO_ATTACH_CAP_SERVICE_CONNECTION_HINT=5
APOLLO_ATTACH_CAP_INTENT_PATTERN=5
ApolloGuidanceCache._sorted on the receiver side uses an identical priority key so the order the sender selected is preserved through to the LLM.
Layer 3 — Signal preservation at promote
The promote action's content-extraction helper (_content_from_proposal) strips proposal metadata before storing on the artifact. The three ranking signals (evaluator_score, confidence, weight) must not be added to the metadata strip-list. The constants _METADATA_KEYS and _RANKING_SIGNALS in apollo/curator/actions.py make this contract explicit; TestRankingSignalContract enforces it.
Invariant. If a proposal carries evaluator_score, confidence, or weight at the top level, the promoted artifact's content must carry them too.
Layer 4-A — Evaluator score writeback
apollo/evaluator/persist.py:persist_score_to_artifact writes content.evaluator_score and content.score_decomposition to the artifact document after every signal application in the ingest worker. Uses a Painless script to preserve the type-specific content fields (text, signature, etc.).
Properties:
- Fire-and-forget. Never blocks the ingest hot path.
- Idempotent. retry_on_conflict=3 handles concurrent worker writes.
- Kill-switch. APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false disables persistence without touching the in-memory engine (audit + cascade paths still work).
- Graceful degradation. Failures are logged and counted (apollo_evaluator_score_persist_failed_total); the in-memory engine remains authoritative.
Layer 4-B — Synthesis confidence
Every synthesis prompt (build_failure_pattern_prompt, build_intent_pattern_prompt, build_prompt_shim_prompt, build_sweep_prompt) requires the LLM to emit a top-level confidence: 0.0..1.0. The _SHARED_RULES block explains the semantics — reserve high confidence for patterns the model would stake its reputation on, because Apollo uses it to rank artifacts at attach time.
apollo/learner/synthesis.py:_normalize_confidence is called from _record_proposal and:
- Clamps values to [0.0, 1.0].
- Coerces missing or unparseable inputs to _NEUTRAL_CONFIDENCE = 0.5 so a malformed LLM response doesn't unfairly downrank an otherwise-valid proposal.
The normalized value rides on the proposal through promote (via the Layer 3 contract) onto artifact.content.confidence, where Layer 2's sort consumes it.
Layer 5 — Deepened rationale_summary + per-artifact aggregation
oracle.guidance.attacher._summarize emits a structured summary that names attached and capped artifact IDs per type:
"3 PromptShim (s1,s2,s3); 2 ToolPairingHint (h1,h2) +1 capped (c1)"
ID lists truncate to _SUMMARY_ID_PREVIEW = 5 with a +N tail. Empty input still produces "". Types are sorted alphabetically so summaries diff cleanly across calls.
aggregate_artifact_stats (Layer 1) is the symmetric on-demand summary keyed by artifact rather than by attach call.
Layer 6-A — Artifact embedding at promote
apollo/learner/similarity.py:compute_embedding reuses axonis.memory.embedder.embed (sentence-transformers, gated by the [memory] extra). The vector is stored on content.embedding_vector. Type-aware text extraction handles each artifact type's content shape (PromptShim text, FailurePattern signature+remediation, etc.).
Graceful degradation. When sentence-transformers is unavailable, compute_embedding returns None; the promote still succeeds with no embedding stored. Downstream similarity checks (6-B, 6-C) skip artifacts without embeddings.
Layer 6-B — Promote-time similarity advisory
After the embedding is computed, the promote handler scans active artifacts at the same (type, service_name, tool_name) scope and surfaces matches above the cosine threshold in ActionResult.similar_artifacts. Default threshold: APOLLO_SIMILARITY_THRESHOLD=0.9.
The advisory is informational only — promote still succeeds. Admin chooses whether to demote + supersede the prior(s) by re-promoting with supersede: true and the prior's IDs.
Layer 6-C — Curator-time similarity sweep
apollo/learner/coalescer.py:run_periodic is a fifth background loop alongside snapshot, curator-auto, maintenance, and synthesis-sweep. Each tick:
- Loads all
status=activeartifacts. - Partitions by
(type, service_name, tool_name). - Within each partition, union-finds clusters where every pairwise cosine ≥
APOLLO_COALESCER_THRESHOLD(default0.85, slightly looser than 6-B's promote-time threshold). - For each cluster, calls Apollo's LLM via
build_coalesce_promptto write a coherent merger. - Records the merger as a proposal on
apollo_proposalswithsupersedes: [id1, id2, ...]so admin promote demotes the components atomically.
Bounded per sweep: APOLLO_COALESCER_MAX_CLUSTERS_PER_RUN=5 (defensive LLM-cost cap). Off by default (APOLLO_COALESCER_ENABLED=false) — operators opt in once they're ready to budget the LLM calls and review the proposals.
promote() extends the supersede flag's semantics: when the proposal carries supersedes: [...], each listed artifact is demoted alongside the new promote, in the same atomic batch.
Metrics surface
Each layer adds telemetry so operators can see what's happening:
| Metric | Source layer |
|---|---|
apollo_guidance_attach_null_total{scope, reason} |
observability over the attach path's null returns |
apollo_guidance_attach_success_total{scope} |
counterpart counter for successful attaches |
apollo_guidance_attach_payload_bytes{scope} (histogram) |
size growth — operators alert if it bloats |
apollo_guidance_attach_artifact_count{scope} (histogram) |
distribution of artifacts per attach |
apollo_guidance_attach_capped_total{scope, artifact_type} |
per-type drop counts (Layer 2 → 1) |
apollo_evaluator_score_persisted_total / apollo_evaluator_score_persist_failed_total |
Layer 4-A health |
apollo_coalescer_proposals_emitted_total / apollo_coalescer_merge_failed_total |
Layer 6-C health |
A guidance_health block on GET /stats summarizes per-scope success/null breakdown for at-a-glance review.
Disabling layers
Every layer can be turned off independently:
APOLLO_GUIDANCE_ATTACH_ENABLED=false # disables Layer 2 + everything above
APOLLO_EVALUATOR_PERSIST_SCORES_ENABLED=false # Layer 4-A
APOLLO_SIMILARITY_ENABLED=false # Layer 6-A + 6-B
APOLLO_COALESCER_ENABLED=false # Layer 6-C (default off)
APOLLO_LINEAGE_PERSIST_ENABLED=false # Layer 1
When disabled, the layer degrades to no-op; the rest of the system keeps running with the next-best signal.
Curator
The Curator is the only component empowered to mutate Apollo's memory. All mutations are bounded and auditable.
Allowed autonomous actions
- Promote an artifact (increase its weight in guidance results)
- Demote an artifact (hide from guidance without deleting)
- Forget an artifact (delete after it has been demoted for N evaluation cycles)
- Edit artifact metadata (tags, applicability, version, human-readable notes)
- Summarize / compact raw observations into a new artifact
Disallowed actions (hard invariants)
- Change auth or guardrails configuration
- Widen or narrow a caller's tool access
- Read or mutate another user's conversation data
- Mint tokens, escalate privileges, or bypass OAuth
- Call backend services on behalf of any user
- Modify or delete audit log records
Versioning
Apollo uses a two-tier versioning model. Artifacts are versioned per-mutation; graphs are captured via snapshots (see §Snapshots and trajectory and §Retention). Both are in place from Phase 1 — versioning is cheap to establish up front and impossible to reconstruct retroactively once Curator empowerment goes live in Phase 3.
Artifacts (IntentPattern, FailurePattern, PromptShim, SpecFragment, ToolPairingHint, ServiceConnectionHint, CapabilityMap, DecisionTrajectory, DriftEvent, IntentSchema, SchemaDrift, PromptShape). Every mutation — autonomous Curator action, admin edit, synthesis-proposed edit, rollback — produces a new version:
- Current version lives in
apollo_artifacts. - Every prior version is copied to
apollo_artifact_historybefore the mutation. - Each artifact record carries
version,prev_version_id,change_reason,actor. apollo_artifact_historyhas noexpires_ts— prior versions are retained indefinitely as the rollback substrate (§Retention).- Rollback:
POST /api/v1/apollo/artifacts/{id}/rollbackwith targetversionorprev_version_idreplaces the current record and writes a new version whoseprev_version_idpoints at the post-rollback state (so rollback itself is a versioned event, recorded in audit).
Graphs (DecisionGraph). Per-observation node/edge mutations are too high-frequency to version individually. Graph rollback uses snapshots instead:
- Hourly snapshots for 7 days, daily for 30 days, weekly for 90 days (per §Retention).
- Admin rollback on a graph restores from a prior snapshot. Coarser granularity than artifact rollback by design.
- Structural mutations initiated by admin or Curator on a graph (e.g., manually forgetting a node, merging two nodes) are tracked as audit events in
apollo_auditwith before/after snapshot IDs.
Audit log
Every Curator action, Evaluator-driven demotion, drift-hold, upstream artifact re-flag, and admin-chat state mutation writes a record to the Elastic apollo_audit index. The index follows the shared axonis Elastic convention (flat index, UDS shell, expires_ts, delete_by_query cleanup — see §Index mappings and templates and §Retention).
Record schema: ApolloAuditRecord (oracle/curator/audit.py), written via write_audit() and persisted through the ApolloAudit UDS store (oracle/memory/store.py). Beyond the standard UDS/expires_ts envelope it carries:
- Action axes —
action(ActionKind: promote / demote / forget / edit / rollback / compact / drift_hold / upstream_flag / pause_curator / resume_curator),actor(curator_auto,evaluator_auto, oradmin:<username>), andtrigger(the free-form cause, e.g.evaluator_score_below_threshold,l3_performance_cascade,drift_event,admin_manual,synthesis_proposal). - Target —
artifact_id,artifact_type,before_version_id/after_version_id, andrelated_drift_event_idwhentrigger = drift_event. - Scoring — nullable
evaluator_scoreandscore_decomposition(per §Evaluator outputs);upstream_artifact_idswhen a cascade flagged parents. - Explanation —
rationale(REQUIRED, always present: LLM-synthesized for LLM-driven actions, templated fromscore_decomposition+triggerfor deterministic ones) andevidence_ref(observation ids, graph snapshot id, related audit ids). - Lifecycle —
indefinite(true ⇒ nullexpires_ts, never purged) and optional admin-suppliedadmin_note.
Rationale vs. admin_note. rationale is Apollo's own account of why it acted — always present, always auto-generated. admin_note is the admin's own commentary when they take an action — optional, human-supplied. Both are preserved and queryable.
Retention. Default 90 days (configurable via APOLLO_AUDIT_RETENTION_DAYS), enforced by the maintenance task's delete_by_query on expires_ts. Records marked indefinite: true have a null expires_ts and are never deleted — used for critical admin actions (forget of an artifact, pause/resume of Curator, rollback of a versioned artifact). The admin API allows setting indefinite when taking such actions.
Queryable. GET /api/v1/apollo/audit supports filters on time range, action, actor, artifact id, artifact type, trigger, and score-decomposition terms (e.g., "all demotions triggered primarily by L3 errors last 7 days"). Score decompositions let admins see why a score moved without re-deriving it from observations.
Evaluator
The Evaluator scores artifacts based on outcome correlation: after an artifact is published to guidance, do subsequent traces that used it produce better outcomes than traces that did not?
Inputs
- Raw observations (trace outcomes)
- Artifact usage records (which artifacts were returned in guidance, which were incorporated)
- Explicit feedback signals
Failure signals (feeds the evaluator)
An event is considered a failure (negative signal for any artifact associated with its trace) if any of the following:
- Layer 3 returned an error (HTTP 5xx or tool exception) — Layer 3 performance signal. Applies to both agent and library observations. Under oracle-sole-observer (§Invariants 14), the observation is emitted by oracle; the signal keys on the envelope's
servicefield (the observed L3 target), not on who performed the HTTP POST. - Output schema mismatched the Layer 1 intent schema — Layer 3 performance signal. Applies only when the observed L3 service is an agent (
component_kind == "agent"on itsServiceRegistryrecord). Libraries have no agent-level intent contract; their outputs are raw CRUD/compute results and schema mismatch is not evaluated for them. The Evaluator looks upcomponent_kindby the envelope'sservicefield at signal-application time — oracle is always the actual emitter, but the service it observed is what the contract keys on. - User feedback was negative (thumbs-down, correction, abandoned conversation)
- Self-assessed evaluator confidence was below threshold
All four feed the Evaluator; signal 2 is gated on component_kind per the above. Weights are configurable via APOLLO_EVALUATOR_WEIGHTS.
Layer 3 performance carries amplified penalty
Signals 1 and 2 both reflect Layer 3 performance — what the backend services actually produced when acting on Apollo's guidance. If Layer 3 components are not performing well, that is a strong indication that the workflow generation (Layer 1 prompts) and the artifacts driving that generation need to be updated.
Accordingly, the Evaluator applies an amplified penalty to Layer 3 performance failures:
- Default weight tiers:
L3_performance: 3.0,user_feedback: 1.5,evaluator_confidence: 0.5. - Sustained L3 underperformance against a given artifact accelerates the Curator lifecycle:
- Normal demotion cycle requires N=5 below-threshold evaluation cycles before forget.
- L3-driven demotion triggers after N=2 cycles when signals 1 or 2 dominate the score. Rationale: if services are reliably failing on an artifact's guidance, waiting out a long demotion window lets bad guidance keep shaping traffic.
- When an artifact's score degradation is attributable primarily to Layer 3 signals, the Evaluator additionally flags the upstream artifacts — the
IntentPattern,PromptShim, orSpecFragmentthat shaped the Layer 1 prompt which in turn produced the Layer 3 call — for LLM review on the next synthesis trigger. The synthesis LLM may propose edits to those upstream artifacts, creating a cross-layer correction cycle. - Repeated L3 failures on the same artifact within a short window escalate to a
DriftEvent(not just a score drop), forcing admin review rather than silent demotion.
Weights and thresholds are tunable via env vars (APOLLO_EVALUATOR_WEIGHT_L3_ERROR, APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH, APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK, APOLLO_EVALUATOR_WEIGHT_CONFIDENCE, APOLLO_EVALUATOR_L3_FAST_DEMOTE_N).
Outputs
Per-artifact rolling score (exponential moving average). Scores feed the Curator's demote/forget policies. Scores are visible in admin stats and in the audit log when they trigger actions. Score decompositions (per-signal contributions) are retained so admins can see why a score moved — Layer 3 errors vs. user feedback vs. schema mismatch are distinguishable in the audit trail.
Admin Chat
A conversational interface to Apollo, gated by role admin via oracle's existing guardrails (SPEC-03 §Guardrails).
POST /api/v1/apollo/chat
Request body mirrors oracle's /chat:
{
"message": "Forget everything Apollo learned about cohort X last week",
"conversation_id": "apollo_admin_sess_...",
"model": "default"
}
The admin chat uses Apollo's own LLM (separate from oracle's primary LLM) with a set of memory-management tools:
list_memories(filter)get_memory(id)forget_memory(id)promote_artifact(id)/demote_artifact(id)rollback_artifact(id, to_version)rollback_graph(graph_id, to_snapshot)trigger_synthesis(trace_id?)explain_decision(trace_id | artifact_id | audit_id)— returns therationale+evidence_reffor a Curator actionlist_decisions(artifact_id?, since?, trigger?)— audit-filtered view of recent Curator actionsdiscuss_decision(artifact_id | audit_id)— opens a focused conversation thread: Apollo's LLM replies with the stored rationale, walks through the evidence (graph snapshot, score decomposition, upstream artifacts), and answers admin follow-ups. The admin can invoke promote/demote/rollback/forget tools inline in the same thread to act on the finding.pause_curator()/resume_curator()
Admin ↔ Apollo conversation
Every Curator action carries a rationale written by Apollo at commit time and persisted in apollo_audit. Admin chat is where those rationales become conversational: an admin asks "why did you just demote pshim_xyz?" and Apollo's LLM retrieves the audit record, reads out the rationale and evidence, and answers follow-ups by re-reading the underlying observations and graph state.
So admin chat is not just a command console — it is the review surface for Apollo's own findings. Admins probe rationales, challenge them, and issue corrections (rollback, forget, edit, pause) without leaving the conversation. Every follow-up action is itself audited to the index with actor: "admin:<username>" and a fresh rationale, preserving the admin's reasoning alongside Apollo's.
Non-admin users cannot reach /chat; their interaction with Apollo is purely transitive, through oracle.
Endpoints
REST (mounted under oracle's /api/v1/apollo/)
| Method | Path | Who | Purpose |
|---|---|---|---|
| POST | /api/v1/apollo/observations |
Admin + out-of-process services | Admin replay/seed, plus the fallback ingest path for services outside oracle's MCP dispatch reach. Phase-1 emitters (oracle + cortex) do not use this endpoint — oracle emits on their behalf in-process (§Ingest Semantics). |
| GET | /api/v1/apollo/guidance?scope=l1 |
Admin | Preview current L1-scoped artifact set |
| GET | /api/v1/apollo/guidance?scope=l3:<service> |
Admin | Preview current L3-scoped artifact set for an agent |
| GET | /api/v1/apollo/guidance/schemas |
Admin | Inspect learned intent schemas |
| GET | /api/v1/apollo/guidance/tools |
Admin | Inspect tool descriptions / routing hints |
| GET | /api/v1/apollo/guidance/specs |
Admin | Inspect spec fragments |
| GET | /api/v1/apollo/guidance/connections |
Admin | Inspect service-connection hints |
| GET | /api/v1/apollo/guidance/stream?scope=<scope> |
Admin | Real-time SSE feed of Curator commits (debugging only) |
| POST | /api/v1/apollo/chat |
Admin | Conversational admin interface |
| GET | /api/v1/apollo/memories |
Admin | List observations with filters |
| GET | /api/v1/apollo/memories/{id} |
Admin | Inspect one observation |
| POST | /api/v1/apollo/memories |
Admin | Seed an observation manually |
| PATCH | /api/v1/apollo/memories/{id} |
Admin | Edit metadata (tags, notes) |
| DELETE | /api/v1/apollo/memories/{id} |
Admin | Forget |
| GET | /api/v1/apollo/artifacts |
Admin | List learned artifacts |
| GET | /api/v1/apollo/artifacts/{id} |
Admin | Inspect one artifact + version history |
| PATCH | /api/v1/apollo/artifacts/{id} |
Admin | Edit |
| POST | /api/v1/apollo/artifacts/{id}/promote |
Admin | Promote |
| POST | /api/v1/apollo/artifacts/{id}/demote |
Admin | Demote |
| POST | /api/v1/apollo/artifacts/{id}/rollback |
Admin | Revert to a prior version |
| DELETE | /api/v1/apollo/artifacts/{id} |
Admin | Forget |
| GET | /api/v1/apollo/audit |
Admin | Query audit log |
| POST | /api/v1/apollo/learn |
Admin | Manually trigger an Apollo synthesis pass |
| GET | /api/v1/apollo/stats |
Admin | Apollo's own observability (counts, timings, scores) |
MCP (admin chat tools)
Apollo's MCP tools mirror the admin CRUD surface, exposed only to Apollo's own admin chat LLM (§Admin Chat) — not aggregated into oracle's user-facing /agentspace MCP catalog. The tools are served from a private MCP endpoint mounted by oracle.oracle.chat.server and reachable only through the admin-chat conversation; they are never visible to L1 or L3 LLMs.
apollo_list_memories,apollo_get_memory,apollo_forget_memoryapollo_list_artifacts,apollo_get_artifact,apollo_promote_artifact,apollo_demote_artifact,apollo_rollback_artifact,apollo_forget_artifactapollo_list_graphs,apollo_get_graph_snapshot,apollo_rollback_graphapollo_query_auditapollo_trigger_synthesisapollo_list_decisions,apollo_explain_decision,apollo_discuss_decisionapollo_pause_curator,apollo_resume_curatorapollo_stats
Authentication & Authorization
- Admin endpoints require
adminrole via oracle's OAuth middleware + guardrails (SPEC-03). - Guidance
GETendpoints are admin-only. L1 and L3 never call them. They exist for admin inspection of what Apollo would currently inject. - Secondary ingest path (
POST /api/v1/apollo/observations) accepts either the admin's Bearer token (for replay/seed) or, for any out-of-process emitter, the user's forwarded Bearer token — the same token oracle forwards downstream in its existing cross-service calls. There is no service-token infrastructure in axonis-core today; every cross-service call in the stack forwards the user's Keycloak-issued token (verified end-to-end against JWKS). Admin replay/seed additionally requires theadminrole. Phase-1 emitters do not exercise this path. - Oracle's primary in-process path (all L1-relayed events + oracle's own
llm_turn+ oracle-observed L3tool_output/tool_error+final_response) bypasses network auth — it is a direct function call within the same process, already authenticated at the ingress byOAuthMiddleware. - Neither L1 nor L3 authenticates to Apollo — neither layer addresses Apollo on any path (ingest or guidance). Both talk to oracle; oracle handles Apollo (§Invariants 14).
- Injection channel (response-attached) rides the ambient auth of the envelope it is embedded in. The
/chatresponse is already authenticated per the inbound/chatrequest; the outbound MCP dispatch is already authenticated per oracle's forwarded token. No additional auth layer is introduced for attached guidance. - Admin SSE debug feed uses the same
OAuthMiddlewareon connection handshake and is gated to theadminrole. - Apollo honors all oracle guardrails. Curator cannot widen a caller's tool access. Attached guidance that references tools a subscriber cannot use is filtered out before the envelope is serialized.
- Deferred: once a Keycloak client-credentials grant is introduced for service-to-service auth (noted in SPEC-03 as pending),
APOLLO_SERVICE_TOKEN-authenticated ingest from background/batch workers becomes possible. Until then, ingest without a user token context is not supported.
Ingest Semantics
Observation ingest has two paths. The primary path, used by every Phase-1 emitter (oracle + cortex), is in-process only — oracle observes the envelopes flowing across its own boundaries and calls oracle.oracle.observer.ingest directly. The secondary path is the HTTP POST endpoint, mounted for admin replay/seed and for future services running outside oracle's MCP dispatch reach.
Primary path: in-process emission by oracle
Per §Invariants 14, neither L1 nor L3 addresses Apollo directly. Oracle is Apollo's sole emitter in production. On every inbound /chat request, oracle extracts L1 signals from the request body and calls the observer in-process. On every outbound MCP dispatch, oracle observes the round-trip and emits in-process on the L3 service's behalf:
| Event(s) | Emitted when | Emitter call site |
|---|---|---|
intent_schema, user_prompt, user_feedback |
/chat request arrives or a feedback submission is posted |
oracle/server/api/routes.py |
llm_turn |
oracle's own LLM request/response cycle completes | oracle/server/llm/tool_executor.py |
tool_output, tool_error |
an outbound MCP dispatch to an L3 service returns | oracle/server/llm/tool_executor.py + oracle/server/mcp/server.py (proxy path) |
final_response |
oracle is about to return the /chat response body |
oracle/server/api/routes.py |
All emissions flow through helpers in oracle/oracle/hooks/chat.py which enqueue the envelope on the in-process async queue via oracle.oracle.observer.ingest.ingest(...). No network call. No authentication layer (the helpers live inside oracle's process, authenticated at the ingress by OAuthMiddleware). Failure modes are purely local: a full queue increments apollo_ingest_queue_dropped_total; an observer exception is caught and logged so the user request is unaffected.
Secondary path: HTTP POST (admin replay + out-of-process services)
The POST /api/v1/apollo/observations endpoint remains mounted on oracle's Starlette app for two use cases:
- Admin replay/seed — an admin manually re-ingests observations (e.g., to backfill after an outage or to seed synthetic test data). Requires the
adminrole. - Services outside oracle's MCP dispatch reach — any future service whose outputs are not observable through an oracle-mediated MCP round-trip can emit via
ApolloClient. None of the Phase-1 emitters use this path.
Endpoint:
POST /api/v1/apollo/observations
Content-Type: application/json
Authorization: Bearer <user-token> # admin token for replay, or the user's forwarded token for out-of-process services
traceparent: 00-<trace-id>-<parent-span-id>-<flags>
{ "observations": [<envelope>, ...] }
A single envelope is always valid; the array form enables batching on the client. Apollo responds 202 Accepted as soon as every envelope is placed on the in-process queue. Per-envelope validation happens inside the background worker and is logged (not bubbled to the caller) so a single bad envelope does not fail a batch.
The HTTP POST is a fire-and-accept call. Because Apollo's request handler does nothing but enqueue, the server-side operation is a local memory write — never a WAN hop inside the request. Client-side timeouts can therefore be generous (default 30 s) without risking silent drops from network jitter: the handler always responds in sub-millisecond time on a healthy Apollo.
Client-side helper: ApolloClient
ApolloClient in axonis-core is the HTTP client used by the secondary path. Phase-1 services (oracle + cortex) do not import it — oracle emits in-process and cortex emits nothing at all. ApolloClient is retained so admin tooling and any future out-of-process emitter can reach the endpoint without a bespoke HTTP client.
ApolloClient.emit(envelope) does a single httpx.AsyncClient.post with:
- A generous request timeout (
APOLLO_INGEST_POST_TIMEOUT_SEC, default 30). - Bounded retries with exponential backoff + jitter on transient failures (
APOLLO_INGEST_RETRY_ATTEMPTS, default 2;APOLLO_INGEST_RETRY_BASE_MS, default 200;APOLLO_INGEST_RETRY_CAP_MS, default 2000). Transient = timeout, 5xx, 429, connection error. 4xx except 429 is not retried. - Client-side batching via a size-or-interval hybrid:
APOLLO_INGEST_BATCH_SIZE(default 50) orAPOLLO_INGEST_FLUSH_INTERVAL_MS(default 500), whichever first. - Lifecycle flush on process shutdown (signal handler +
atexit) and on explicitApolloClient.flush()calls.
ApolloClient is pure HTTP — the same shape as axonis-core's RestClient and MCPClient (axonis_core/gateway/). No new transport primitive is introduced.
Server side: in-process async queue
Apollo's ingest handler is thin: it parses envelopes, hands each to ingest() (oracle/observer/ingest.py), and returns 202 as soon as every envelope is enqueued. ingest() never blocks the caller — it uses put_nowait and records a drop (apollo_ingest_queue_dropped_total) instead of awaiting backpressure; accepted envelopes increment apollo_ingest_accepted_total. Both the in-process path (oracle calls ingest() directly) and the HTTP path (the oracle.guidance.api route handler) funnel through the same entry point.
The queue is bounded by APOLLO_INGEST_QUEUE_MAXSIZE (default 10000). When the queue fills, put_nowait raises QueueFull and Apollo increments apollo_ingest_queue_dropped_total — the failure is never silent, visible on /stats under degraded_emitters.
A pool of background worker coroutines (APOLLO_INGEST_WORKER_CONCURRENCY, default 4) drains the queue. Each worker performs the full ingest: normalize → write to apollo_observations → update graphs → dispatch synthesis triggers per §Learner. Worker failures are logged and the envelope is reprocessed on a bounded retry budget (APOLLO_INGEST_WORKER_RETRY_ATTEMPTS, default 2) before being moved to a dead-letter log (APOLLO_INGEST_DEAD_LETTER_PATH, optional JSONL file; unset by default).
Failure visibility
No silent failure modes exist on the ingest paths — primary (oracle in-process) and secondary (HTTP POST). Every failure kind is counted. The {service} label is the envelope's service field — the observed L3 target for Phase-1 emissions (oracle is the actual emitter but per-service visibility is what operators need).
| Metric | Meaning |
|---|---|
apollo_ingest_accepted_total{service} |
Envelopes successfully enqueued (both paths) |
apollo_ingest_queue_dropped_total{service} |
Envelopes rejected because the queue was full (both paths) |
apollo_ingest_post_failure_total{service, kind} |
Secondary-path POST failures after retries exhausted (timeout / 5xx / etc.). Never fires for Phase-1 emitters (they go in-process). |
apollo_ingest_worker_failure_total{service} |
Background-worker failures after retries (moved to dead-letter) — applies to both paths |
apollo_ingest_queue_depth |
Current depth of the in-process queue |
apollo_ingest_last_ingest_ts{service} |
Timestamp of last successful enqueue per service — covers both oracle's in-process call and secondary-path POSTs |
apollo_ingest_last_drain_ts{service} |
Timestamp of last successful worker drain per service |
Services whose last_ingest_ts is older than APOLLO_INGEST_STALE_WARN_SEC (default 300) for a service that should be active are surfaced on /stats under degraded_emitters. For Phase-1 services, "degraded" means oracle stopped observing them (e.g., oracle hasn't dispatched an MCP call to cortex in five minutes) — not that a POST failed.
Dedup on at-least-once delivery
Client retries can produce duplicate envelopes. Apollo's observer deduplicates on (trace_id, event_type, timestamp, service) within APOLLO_INGEST_DEDUPE_WINDOW_SEC (default 300) before writing to Elastic.
Config knobs (all prefixed APOLLO_)
| Env var | Default | Purpose |
|---|---|---|
APOLLO_INGEST_BATCH_SIZE |
50 | Max envelopes per POST body |
APOLLO_INGEST_FLUSH_INTERVAL_MS |
500 | Max time an envelope waits in client buffer before flushing |
APOLLO_INGEST_POST_TIMEOUT_SEC |
30 | Per-POST HTTP timeout — generous, since the server handler is in-memory only |
APOLLO_INGEST_RETRY_ATTEMPTS |
2 | Bounded client retries on transient failure |
APOLLO_INGEST_RETRY_BASE_MS |
200 | Base delay for exponential backoff |
APOLLO_INGEST_RETRY_CAP_MS |
2000 | Max delay between retries |
APOLLO_INGEST_QUEUE_MAXSIZE |
10000 | Server-side in-process queue capacity |
APOLLO_INGEST_WORKER_CONCURRENCY |
4 | Number of background worker coroutines draining the queue |
APOLLO_INGEST_WORKER_RETRY_ATTEMPTS |
2 | Bounded worker retries before dead-letter |
APOLLO_INGEST_DEAD_LETTER_PATH |
unset | Optional JSONL path for envelopes moved to dead-letter after worker retries exhausted |
APOLLO_INGEST_STALE_WARN_SEC |
300 | Seconds without a successful POST before an expected-active service is flagged |
APOLLO_INGEST_DEDUPE_WINDOW_SEC |
300 | Window for (trace_id, event_type, timestamp, service) dedupe on at-least-once delivery |
Layer 1 Intent Schema Obligation
Layer 1 is expected but not required to emit an intent_schema observation with each request. The obligation is best-effort throughout Phase 1 and Phase 2, with a configurable path to required once Layer 1's schema contracts stabilize.
Best-effort mode (default)
- Layer 1 SHOULD include an
intent_schemablock in every/chatrequest body it sends to oracle. Oracle extracts the block and emits theintent_schemaobservation to Apollo in-process (§Invariants 14 — L1 never addresses Apollo). A request without the block is still served; oracle simply emits nointent_schemaobservation for that trace. - If a schema is present on a trace, graph nodes are typed explicitly and the
schema_mismatchfailure signal (§Evaluator signal 2) is active for that trace. - If a schema is absent, Apollo's extractors fall back to prompt-inference and mark the resulting nodes
inferred=true. Drift detection and evaluator confidence weight inferred nodes lower. Theschema_mismatchsignal is not evaluated for that trace; the L3-performance penalty (§Evaluator) still fires on signal 1 (hard errors), but signal 2 is dark. GET /api/v1/apollo/statsreportsintent_schema_coverage— percentage of traces with a Layer 1 schema in the last rolling window — so admins can see when Layer 1 coverage is high enough to flip to required.
Required mode
APOLLO_REQUIRE_INTENT_SCHEMA=trueflips behavior: oracle rejects inbound/chatrequests whose body lacks anintent_schemablock with a 400 at the ingress — L1 is the direct caller and sees the rejection. Traces without a schema are never created; nothing to drop at the Observer layer.- The flip is a config change, not a code change. No Apollo, oracle, or L1 redeploy is needed — but Layer 1's
/chatemission behavior must already include the schema or the flip will start rejecting real traffic. - Phase 3 is the expected time to flip, once Curator empowerment demands the cleaner signal. Admin can flip earlier if stats show high coverage.
Logging
Every Apollo module and every service participating in Apollo's
observation / injection loop uses the axonis-core logger rather than a
module-local logging.getLogger() call. The logger module is
axonis.logger — the axonis-core equivalent of the athena logging
utility at athena/athena/logger.py. Both implement the same three-logger
convention (log, error, audit) with identical handler shapes, so
logs from any component read coherently when aggregated.
Three loggers, three audiences
| Logger | When to use | Destination |
|---|---|---|
log |
Normal operational telemetry — info, warning, debug. |
Console + axonis.log |
error |
Exceptions, permanent failures, data-loss events, misconfiguration. | Console + error.log |
audit |
Important transactions that must be traceable independently of volume. | audit.log (file only) |
Import pattern (oracle, axonis-core, beacon, cortex):
from axonis.logger import log, error, audit
Services that already depend on athena (e.g., athena itself) use
from athena.logger import log, error, audit instead. The API and format
are identical between the two.
What counts as audit-worthy
Apollo MUST route the following transactions through the audit logger
so they leave a trail in audit.log separate from regular operational
noise:
- Worker pool start / shutdown / cancellation (§Ingest Semantics).
- Graph snapshot completion (§Snapshots and trajectory) — per hour.
- Every Curator action — promote, demote, forget, edit, rollback,
compact, drift-hold, upstream-flag, pause_curator, resume_curator.
(Complementary to the
apollo_auditElastic index: the audit log captures the event as structured text alongside other platform audit events; the Elastic index is the queryable, structured source of truth.) - LLM synthesis proposals that result in a Curator commit (the proposal → drift-check → commit boundary).
DriftEventcreation (§Graph-anchor drift check).- Admin chat actions that mutate state, logged with
actor: "admin:<username>". - Guidance injection commits — every push of
apollo_guidanceonto an outbound envelope is audit-worthy at the commit level, though the per-turn attachment is a delivery detail (not audited). - Subscriber connection / disconnection events on the admin SSE debug feed.
What stays in log / error
- Per-observation ingest (
log.info/log.debug) — too high-volume foraudit. - Per-attach-turn emissions — ditto.
- Retry attempts, transient failures —
log.warning. - Timeouts on the attach path (graceful degradation) —
log.warning. - Queue overflow, exhausted retries, worker failures —
error. - Exceptions swallowed by the hot path —
errorso they still land inerror.logwithout propagating into the request path.
Rationale
Splitting the three channels keeps audit.log the single place an
operator or admin-chat tool can scan when investigating a system-level
state change without being drowned in routine telemetry. Separating
error.log keeps every permanent-failure signal (data loss, persistent
outage, contract violation) in one place regardless of which module it
came from.
Failure Posture
- Apollo slow on attach: the in-process guidance fetch is bounded by
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS(default 10 ms). On overshoot, oracle serializes the/chatresponse or MCP dispatch withoutapollo_guidance. Subscribers proceed without guidance on that turn — equivalent to pre-Apollo behavior. User sees no failure; metricapollo_guidance_attach_timeout_totalsurfaces the event. - Apollo unreachable as a process: since Apollo is a package inside oracle, "Apollo unreachable" means oracle is itself broken, which is a larger incident. If the Apollo module fails to import or initialize at startup, oracle continues serving
/chatand tool dispatches withoutapollo_guidanceattached. Ingest endpoint returns 503. - Ingest queue full:
POST /api/v1/apollo/observationsresponds 202 but incrementsapollo_ingest_queue_dropped_total{service}. Never silent — visible on/statsunderdegraded_emitters. - Ingest client POST fails: client retries within budget, then drops the batch and increments
apollo_ingest_post_failure_total{service, kind}. Visible on/stats. Emitter's task continues unaffected (observations are telemetry, not transactional). - Apollo worker crashes mid-ingest: at-least-once redelivery from the asyncio queue; observer dedupes on
(trace_id, event_type, timestamp, service). - Apollo hallucinates a bad artifact: subscribers apply it on one turn; the Evaluator detects outcome degradation on subsequent observations and demotes; the next attached payload reflects the demotion. Admin can force-rollback via audit log at any time.
- Curator goes rogue: every action (mutation + commit) is in the audit log; admin can
pause_curator()immediately via chat or CLI. Paused Curator → artifact set stops changing → attached payloads continue to reflect the as-of-pause state until resume.
Apollo's LLM
Apollo runs its own LLM, separate from oracle's user-facing LLM routing. Apollo's LLM is the primary driver of synthesis, invoked per event (see §Learner → LLM synthesis).
Model: pluggable by design
The model is selected by configuration and must remain swappable without code changes. Apollo's LLM provider layer normalizes across providers so that a newer, stronger model can replace the current one as the state of the art advances.
Current default: MiniMax M2.7.
It is the best-available fit at the time of this spec given its context window, cost profile, and availability — but the spec is deliberately agnostic. Apollo must not encode MiniMax-specific assumptions in prompt shapes, input formats, or response parsers. The provider layer handles any per-model translation.
Operators can swap the model by changing env vars only:
APOLLO_LLM_PROVIDER=minimax # current default; swap with any provider registered in the router
APOLLO_LLM_MODEL=m2.7 # current default; replace with a newer model when available
APOLLO_LLM_API_KEY=...
APOLLO_LLM_BASE_URL=... # for self-hosted or proxied inference
APOLLO_SYNTHESIS_MAX_CONCURRENT=4 # cap on concurrent synthesis calls (event bursts from L3)
The LLM router (oracle/oracle/llm.py) must support MiniMax as a first-class provider alongside anthropic / openai / groq / trinity / ollama in oracle's existing router. New providers register through the same interface — adding a model is an additive router change, never a change to Apollo's business logic.
Local MiniMax via HuggingFace (native, pre-trained, on-disk)
Apollo's LLM layer reserves a provider slot for a locally-stored, HuggingFace-pulled MiniMax model, selected with APOLLO_LLM_PROVIDER=minimax-local (see §Environment Configuration). It complements the default OpenAI-compatible endpoint path for air-gapped clusters, own-GPU inference, or operator-owned fine-tunes; the hosted path (openai provider) is unchanged and remains the default.
Canonical HuggingFace load signature. The provider MUST honor the model card's canonical from_pretrained call shape with trust_remote_code=True — required because MiniMax ships its own tokenizer and modeling code alongside the weights (a custom fine-tune needs the flag too):
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7", trust_remote_code=True)
Apollo does not parse or override HuggingFace's on-disk cache layout (${HF_HOME:-~/.cache/huggingface}/hub/...). Pre-pulling the checkpoint (during image build or via an init container) is the recommended production pattern — the first cold from_pretrained downloads tens of gigabytes, unacceptable on the request path. MiniMax-M2.7 is large: plan for tens of GB on disk and a GPU/multi-GPU node with VRAM for the resolved context window; deployments that can't meet that budget stay on the hosted path.
Operator knobs (implemented). APOLLO_LLM_LOCAL_MODEL_PATH loads weights from outside the HF cache (passed verbatim as from_pretrained's first argument; unset → the stock MiniMaxAI/MiniMax-M2.7 checkpoint). APOLLO_LLM_LOCAL_DEVICE_MAP, APOLLO_LLM_LOCAL_TORCH_DTYPE, and APOLLO_LLM_LOCAL_LOAD_IN_4BIT / APOLLO_LLM_LOCAL_LOAD_IN_8BIT are additive from_pretrained overrides (4-bit wins if both bitsandbytes flags are set; an unrecognized dtype or a missing torch is logged and ignored rather than fatal).
Ships today vs. deferred. The minimax-local provider is implemented in oracle/oracle/llm.py (Milestone 8): lazy transformers import, the canonical load signature above, the operator knobs just listed, and prompt completion where weights and deps are present. Still deferred (tracked in the dev-plan): thread/process-pool offload of the synchronous forward pass (today it runs inline on the event loop); pre-pull orchestration + readiness gate; token streaming through the provider abstraction. Until those land, minimax-local is a dev / air-gapped-lab fallback and the production default stays APOLLO_LLM_PROVIDER=openai against an OpenAI-compatible MiniMax endpoint.
Separation from oracle's user-facing LLM
Apollo's LLM configuration is independent of oracle's user-facing LLM routing. The two can use the same provider or different providers; the same model or different models. Apollo's usage is tracked separately via the Meter (SPEC-03 §Metering) under client id apollo. User-facing chat metrics and Apollo metrics are separate in dashboards.
Budget isolation
A burst of synthesis calls triggered by a long fusion run must not starve user-facing chat. Apollo's LLM has its own rate limit, its own quota, and its own metering client id. When Apollo's quota is exceeded it defers synthesis (the event queue holds triggers up to a cap); user-facing oracle chat is unaffected.
Drift Prevention
Apollo influences a large fraction of the system. Drift in its artifacts cascades into the prompts of layer 1 and the outputs of layer 3. The spec encodes several anti-drift guarantees:
- Observation cadence is fixed and coarse. No per-token events. High-signal-to-noise ratio in the raw data.
- Graphs are the deterministic anchor. Decision graphs update on every observation via rule-based extractors — they never create free-form artifacts and cannot drift on their own. They record what actually happened, not what the LLM thinks happened.
- Every LLM output is checked against the graphs. The LLM is the primary driver of synthesis, but every artifact, promotion, and pattern it proposes is validated against current graph state and trajectory before the Curator commits it. Proposals that diverge from graph-recorded reality are flagged as
DriftEventand held for admin review. - Drift is detected structurally, not rate-limited. Short-window vs. long-window edge-weight divergence, rate-of-new-nodes caps, LLM-output-vs-graph divergence, and trajectory breaks distinguish smooth evolution (allowed) from sudden shift (flagged). Apollo can learn continuously because the graphs provide a rigid referent.
- Curator is bounded. Cannot touch auth, guardrails, or user data — only Apollo's own artifacts.
- Every Curator action is auditable. Admin can see what changed, when, why, and by whom.
- Every artifact is versioned. Rollback is always possible.
- Evaluator closes the loop. Artifacts that stop correlating with good outcomes decay automatically.
- Admin can pause the Curator. An emergency off-switch prevents runaway mutation.
- Guidance degrades gracefully. If Apollo is slow or wrong, oracle falls through without injection — the base system still functions.
Environment Configuration
Apollo does not redefine any env var that already exists in the
platform deployment layer. The canonical source for deployment-level
configuration is developers-environment/conf/*.env — one file per
target (development.axonis.ai.env, matrix.axonis.ai.env,
edge.axonis.ai.env, vector.axonis.ai.env, etc.). Every target ships
a consistent platform baseline; Apollo inherits it transitively through
axonis-core, oracle, and its own storage/logger dependencies.
Inherited platform variables (not Apollo's to define)
| Variable(s) | Consumer | Apollo's use |
|---|---|---|
ELASTIC_HOST, ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_VERIFY, ELASTIC_TIMEOUT, ELASTIC_TEMPLATES, ELASTIC_PKI_CA |
axonis.elastic.Elastic |
Storage for apollo_observations, apollo_artifacts, apollo_graph_*, apollo_audit. Every Memory(UDS) subclass in apollo/memory/store.py inherits this config. |
REDIS_URL (oracle-style) or REDIS_HOST + REDIS_PORT + REDIS_PASSWORD + REDIS_DB + REDIS_TLS + REDIS_VERIFY (platform-standard) |
oracle/server/memory/*, axonis.redis.Redis |
Oracle's ConversationStore + CrossServiceMemory; unused directly by Apollo. |
SSO_CLIENT_ID, SSO_CLIENT_SECRET, SSO_TOKEN_URL, SSO_WELLKNOWN, SSO_INTROSPECT_URL, SSO_VERIFY |
oracle's OAuthMiddleware (+ axonis.auth) |
Validates Bearer tokens on every request reaching /api/v1/apollo/*. No Apollo-specific auth config. |
ATLAS_LOG_LEVEL, ATLAS_WORKSPACE, AXONIS_LOG_LEVEL, AXONIS_WORKSPACE |
axonis.logger (§Logging) |
Log level + log-file root for Apollo's three logger streams (log/error/audit). oracle/tests/conftest.py also respects ATLAS_WORKSPACE for test-session log placement. |
FEDERATE_DOMAIN, FEDERATE_NAME, FEDERATE_UUID, FEDERATE_PARTY_*, FEDERATE_PROTOCOL_*, FEDERATE_WORK_MODE_* |
axonis.uds federation hooks |
Picked up automatically if/when Apollo artifacts start federating (post-Milestone 13). No Apollo-specific federation config. |
Apollo-owned variables (all APOLLO_*)
Canonical location: developers-environment/conf/*.env — specifically the shared dev-env file (development.axonis.ai.env) plus any target-specific overrides (matrix.axonis.ai.env, vector.axonis.ai.env, edge.axonis.ai.env). Every APOLLO_* variable is declared there with a production-ready default. oracle/oracle/settings.py reads them via os.getenv(...) with fall-back defaults that match the env-file values, so if the shared env is unsourced the system still comes up sensibly — but the authoritative source is the deployment env file.
Why it lives in the shared env file rather than per-service: Apollo's observation path runs in oracle, but its configuration surface informs the contract every other service consumes (guidance attach budgets, trace-propagation expectations, retention windows). Keeping the defaults in the shared env file means oracle, parallax, cortex, and beacon all load the same baseline — an operator flipping APOLLO_CURATOR_AUTONOMOUS=true in the shared file affects the whole deployment consistently.
Every APOLLO_* variable Apollo's settings.py reads is mirrored in the env file. Grouped by subsystem:
- LLM:
APOLLO_LLM_PROVIDER,APOLLO_LLM_MODEL,APOLLO_LLM_API_KEY,APOLLO_LLM_BASE_URL; local-MiniMax knobsAPOLLO_LLM_LOCAL_MODEL_PATH,APOLLO_LLM_LOCAL_DEVICE_MAP,APOLLO_LLM_LOCAL_TORCH_DTYPE,APOLLO_LLM_LOCAL_LOAD_IN_4BIT,APOLLO_LLM_LOCAL_LOAD_IN_8BIT - Synthesis:
APOLLO_SYNTHESIS_MAX_CONCURRENT - Ingest client side:
APOLLO_INGEST_BATCH_SIZE,APOLLO_INGEST_FLUSH_INTERVAL_MS,APOLLO_INGEST_POST_TIMEOUT_SEC,APOLLO_INGEST_RETRY_ATTEMPTS,APOLLO_INGEST_RETRY_BASE_MS,APOLLO_INGEST_RETRY_CAP_MS - Decision Graphs:
APOLLO_GRAPH_SNAPSHOT_INTERVAL,APOLLO_GRAPH_EWMA_SHORT,APOLLO_GRAPH_EWMA_LONG,APOLLO_GRAPH_TRACE_STATE_TTL_SEC - Evaluator:
APOLLO_EVALUATOR_WEIGHT_L3_ERROR,APOLLO_EVALUATOR_WEIGHT_SCHEMA_MISMATCH,APOLLO_EVALUATOR_WEIGHT_USER_FEEDBACK,APOLLO_EVALUATOR_WEIGHT_CONFIDENCE,APOLLO_EVALUATOR_L3_FAST_DEMOTE_N,APOLLO_EVALUATOR_NORMAL_DEMOTE_N - Curator:
APOLLO_CURATOR_AUTONOMOUS,APOLLO_CURATOR_AUTO_INTERVAL_SEC - Maintenance:
APOLLO_MAINTENANCE_INTERVAL,APOLLO_OBSERVATION_RETENTION_DAYS - Trace propagation:
APOLLO_TRACE_HEADER,APOLLO_REQUIRE_TRACEPARENT - Observation obligations:
APOLLO_REQUIRE_INTENT_SCHEMA - Integration (
ApolloClient):APOLLO_BASE_URL
None of these duplicate a platform variable. When adding a new APOLLO_*, add it to both oracle/oracle/settings.py (with its default) and the shared env file (with the same default) in one commit.
Per-deployment overrides
Each *.env in developers-environment/conf/ targets a specific
deployment (development, matrix, vector, edge, etc.). The shared
development.axonis.ai.env holds the baseline; production targets
override via their own file. Any Apollo variable that needs to differ
per target lives in the target-specific env — never hardcoded into
settings.py. Operators change behavior by editing the env file and
reloading, not by shipping code.
Dependencies
Apollo declares no new top-level dependencies — every library it needs is already in oracle's oracle/pyproject.toml. It inherits oracle's stack (axonis-core, FastAPI/Starlette, redis, the anthropic and openai provider SDKs) and activates two libraries oracle already ships but that are otherwise dormant: sentence-transformers (embeddings) and numpy (dense-vector math).
LLM provider SDK. Apollo's current default LLM is MiniMax M2.7 (see §Apollo's LLM). MiniMax exposes an OpenAI-compatible API, so Apollo reaches it via the existing openai client with APOLLO_LLM_BASE_URL pointed at the MiniMax endpoint — no new SDK dependency is added. If a future model swap requires a non-OpenAI-compatible provider, an additive dependency joins oracle's existing provider set.
Invariants
- Apollo does not execute workflows. It observes, learns, and advises. It never calls tools, never invokes backend services, never retries a failed request. Layer 1 drives iteration.
- Curator empowerment is bounded to Apollo's own artifacts. Curator cannot change auth, guardrails, token scopes, user conversations, or any non-Apollo state.
- Every autonomous action is auditable. No Curator mutation occurs without a record in
apollo_audit. - Apollo is internal. No Apollo endpoint is exposed outside the cluster except through oracle's existing external surface. Oracle remains the only externally exposed service (SPEC-03 invariant 1).
- Apollo failures do not break oracle. Guidance attachment has a hard timeout (
APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS); on overshoot or internal Apollo failure, oracle serializes the response / MCP dispatch withoutapollo_guidance. Ingest failures never block the emitter's task and are surfaced as metrics (apollo_ingest_post_failure_total,apollo_ingest_queue_dropped_total) on/stats— never silent. - Apollo uses axonis-core's Memory UDS as its storage primitive. It does not re-implement or bypass the UDS pattern from SPEC-01.
- Admin chat is the only conversational surface. Role
adminis required. Non-admin users interact with Apollo only transitively via oracle. - Observation cadence is coarse by design. Token-level observations are prohibited. Turn-level, tool-level, error-level, and response-level only.
- Axonis-core remains ML-free. Any future ML dependencies (e.g., embedding generation) live in
oracle/oracle/, not in axonis-core. SPEC-01 invariant 1 is preserved. - Artifacts are versioned; graphs are snapshotted. Every Curator mutation to an artifact creates a new version in
apollo_artifact_history; graph-level rollback uses the hourly/daily/weekly snapshot tiers. Rollback is always possible. - Oracle's existing memory modules are not modified. Apollo is additive and coexists with
oracle/server/memory/*andoracle/server/models/memory.pythroughout all three phases. Consolidation of the oracle memory modules is deferred and out of scope for this spec (tracked in the dev-plan). - Apollo's LLM is pluggable. No MiniMax-specific assumptions in prompts, input shapes, or response parsers. Model swap is a config change via
APOLLO_LLM_PROVIDER/APOLLO_LLM_MODEL, never a code change. - Layer 3 performance is the strongest failure signal. Evaluator weighting amplifies L3 errors and schema mismatches over softer signals, accelerates demotion on L3-dominant score drops, and cascades to flag upstream artifacts for synthesis review.
- Neither L1 nor L3 addresses Apollo directly. L1 talks to oracle; L3 talks to oracle (via MCP); oracle talks to Apollo. L1 and L3 hold no Apollo endpoint knowledge, no Apollo credentials, and make no Apollo calls on any in-production path. Oracle is Apollo's sole emitter for all Phase-1 events: L1-origin observations (
intent_schema,user_prompt,user_feedback) are emitted by oracle in-process after oracle receives the corresponding signal from L1; L3-origin observations (tool_output,tool_error) are emitted by oracle in-process after the MCP round-trip to an L3 service returns. Guidance flows the same way in reverse: it reaches L1 attached to/chatresponses, reaches oracle's own chat LLM in-process (no transport, since oracle hosts Apollo), and reaches L3 attached to outbound MCP tool dispatches. ThePOST /api/v1/apollo/observationsendpoint exists as a secondary path for admin replay/seed and for future services running outside oracle's MCP dispatch reach; Phase-1 emitters do not use it. No long-lived connections, no service tokens, no push channel in production. - Injection cannot execute code in any subscriber. Attached
apollo_guidance(or in-process cache contents on the L2 path) carries artifact data only. Subscribers update a local cache and consult it on their next LLM turn. Apollo cannot force a subscriber to act, call a tool, or mutate any state beyond its own cache. - No subscriber registry, no push channel. Apollo has no list of subscribers to push to. Guidance is delivered by oracle attaching the current applicable set to every response/dispatch leaving oracle (L1 attach, L3 attach) and consulted in-process by oracle's own chat LLM on the L2 path. L3 agent eligibility is still governed by
component_kindon theServiceRegistryrecord (libraries are filtered out before attachment); L1 eligibility is implicit (every/chatresponse carries L1 guidance); L2 consumption is implicit (oracle's tool-executor consults the local cache before every turn). - Apollo is the cross-service knowledge transfer channel.
MemoryService(axonis-core) is strictly per-service — every recall is scoped to the calling service's(user_id, service). A preference, fact, or instruction expressed to one service is never directly readable by another. When the same intent needs to shape behaviour across services (e.g. "user prefers concise responses" expressed to beacon should also bias oracle), Apollo's observation stream picks it up, synthesis distills it into an artifact (e.g. aPromptShimwithapplicability.service_name = nullfor cross-service scope), and the guidance attach channel delivers it to every applicable subscriber. Apollo never instantiatesMemoryServicefor cross-service reads — its view is the observation stream, which inherently spans all services. This separation means cross-service knowledge transfer is always curated, audited, and reversible (demote / forget) rather than implicit through silent shared-index reads.
Test Expectations
- Observer tests: each event type round-trips correctly through ingest; trace_id and parent_trace_id stitching works; cadence limits are enforced (no token-level events accepted); every Phase-1 event — L1-origin (
intent_schema,user_prompt,user_feedback), oracle's own (llm_turn,final_response), and L3-origin (tool_output,tool_error) — arrives via oracle's in-process emission path only. Oracle extracts L1 signals from/chatrequest body and feedback submissions, observes the MCP round-trip for L3 outputs, and callsoracle.oracle.observer.ingestin-process on both layers' behalf. A direct emit from L1 credentials or from cortex to any Apollo path is rejected in Phase-1 test fixtures (§Invariants 14). - HTTP ingest tests (secondary path): the
POST /api/v1/apollo/observationsendpoint continues to function for admin replay/seed and for out-of-process emitters.ApolloClient.emitPOSTs the envelope andtraceparentwith an appropriate Bearer token (admin token for replay, user-forwarded token for out-of-process emitters); server returns 202 as soon as the envelope is enqueued on the in-process async queue; queue overflow incrementsapollo_ingest_queue_dropped_totaland is visible on/stats(never silent); client retries on transient failures within the configured attempt budget, then surfacesapollo_ingest_post_failure_totalon permanent failure; at-least-once redelivery is deduped on(trace_id, event_type, timestamp, service)withinAPOLLO_INGEST_DEDUPE_WINDOW_SEC; background worker crashes move envelopes to the optional dead-letter JSONL path when retry budget exhausts; services over lag/staleness thresholds appear indegraded_emitters. - Memory tests: observations, artifacts, graph nodes, graph edges, and graph snapshots indices support CRUD via the axonis-core
Elasticbase class; embeddings generated on store; semantic recall via kNN composes with filters;expires_ts+delete_by_querymaintenance task coarsens and purges correctly. - Graph update tests: extractors are deterministic on every observation; node/edge upserts are idempotent; EWMA short- and long-window weights update correctly; no artifacts are created on the deterministic path.
- Synthesis tests: each event-driven trigger (L1 request, L3 output, admin chat, guidance miss, admin-initiated) invokes the LLM once; concurrent synthesis is bounded by
APOLLO_SYNTHESIS_MAX_CONCURRENT; duplicate triggers within atrace_idare coalesced to the latest observation; synthesis calls receive the correct subgraph and artifact context. - Graph-anchor drift check tests: LLM proposals consistent with graph state are committed; proposals that contradict strongly-weighted edges are flagged as
DriftEventand held for admin review; rate-of-new-nodes cap triggers drift flagging. - Guidance tests: intent → artifacts matching; layer filtering; caller-permission filtering (guardrails); empty-result fallback when artifacts index is empty; 50 ms timeout on the hot path.
- Evaluator tests: all four failure signals detected; L3 performance signals (1 and 2) weight heavier than user feedback and confidence; accelerated demotion (N=2) fires on L3-dominant score drops; upstream artifact re-flag cascade reaches
IntentPattern/PromptShim/SpecFragment; repeated L3 failures escalate toDriftEventrather than silent demotion; per-signal score decomposition is preserved in audit records. - Curator tests: each allowed action (promote, demote, forget, edit, rollback, compact); every disallowed action is refused (auth changes, guardrail changes, user-data access); audit record written for every mutation with
actor,trigger, before/after version; curator-pause blocks all Curator mutations. - Versioning tests: artifact mutation copies prior version to
apollo_artifact_historybefore overwrite; rollback restores the target version and creates a new version whoseprev_version_idpoints at the post-rollback state; rollback event itself appears in audit; graph snapshots restore correctly; structural graph mutations by admin are audited. - Admin chat tests: role gating (admin only); each chat tool executes correctly; audit log shows
actor: "admin:<username>";indefinite: trueflag works for critical actions. - Layer 1 schema tests: best-effort mode accepts traces without
intent_schemaand produces inferred nodes; required mode (APOLLO_REQUIRE_INTENT_SCHEMA=true) rejects schema-less traces with 400;intent_schema_coveragestat reports correct rolling percentage. - LLM swap tests: provider swap via env (
APOLLO_LLM_PROVIDER/APOLLO_LLM_MODEL) takes effect without code changes; MiniMax-via-OpenAI-compatible endpoint is exercised; no MiniMax-specific strings leak into prompt or response parsers. - Injection channel tests: oracle attaches
apollo_guidanceto every/chatresponse body when an applicable artifact set exists for the caller's L1 scope; oracle attachesapollo_guidanceto every MCP dispatch bound for anagent-kind L3 service; dispatches bound forlibrary-kind services do not carryapollo_guidance; attached payload containsas_of,artifacts, andrationale_summary(noinjection_id,trigger, orevidence_ref— those are audit-only); attach-timeout overshoot (APOLLO_GUIDANCE_ATTACH_TIMEOUT_MS) causes omission ofapollo_guidancewith anapollo_guidance_attach_timeout_totalincrement, not a request failure; Curator pause freezes the attached state (subscribers keep receiving the as-of-pause payload); L1 and L3 make no calls to Apollo endpoints in any test fixture — attempts from non-admin tokens to admin preview endpoints return 403. component_kindtests:ServiceRegistryrecords carry acomponent_kindfield (agent|library); oracle attachesapollo_guidanceonly to MCP dispatches bound foragent-kind services;library-kind services emit observations but receive noapollo_guidancein their dispatches; Evaluator signal 2 (schema_mismatch) fires only foragent-emittedtool_outputwith an L1 intent schema on the trace, and is skipped forlibrary-emitted events; re-registering a service with a changedcomponent_kindtakes effect on the next dispatch without Apollo redeploy.ApolloGuidanceCacheSDK tests:cache.update(payload)replaces the full artifact set idempotently; each canonical accessor (get_system_prompt_additions,get_spec_fragments,get_tool_description_overrides,get_tool_pairing_hints,get_active_failure_patterns,get_service_connection_hints) returns correctly-ordered(weight desc, recency desc)results; applicability filtering narrows by intent context; empty-cache fallback returns empty lists /Nonewithout blocking; the SDK holds no transport, no HTTP client, no auth state — it is a pure in-process data structure.- Rationale + evidence tests: every attached
apollo_guidance.artifacts[*]entry carries a non-emptyrationale; everyapollo_auditrecord carries a non-emptyrationaleandevidence_ref; LLM-driven actions produce synthesized rationales, deterministic Evaluator-driven actions produce templated rationales composed fromscore_decomposition;rationaleandadmin_noteare distinct and both queryable; admin chatexplain_decision(trace_id | artifact_id | audit_id)retrieves the stored rationale and resolvesevidence_refpointers to their underlying observations / graph snapshot / score decomposition;discuss_decision(artifact_id | audit_id)opens a chat thread with the rationale pre-loaded and permits inline action tool calls that themselves are audited withactor: "admin:<username>". - Trace propagation tests: L1-minted
traceparentarrives unchanged at oracle; oracle forwards the header unchanged on downstream MCP and REST dispatches viaaxonis_core.gateway.client.extract_http_headers(); MCP context field carriestraceparentend-to-end;ApolloClientstamps both the header and envelopetrace_idon everyPOST /observations; oracle mints a replacement and logsmissing_traceparentwhen the header is absent (best-effort); oracle returns 400 whenAPOLLO_REQUIRE_TRACEPARENT=trueand the header is absent or malformed; envelopetrace_idwins when it differs from the header; a full lineage query returns every event for a singletrace_idacross all emitting layers. - Integration tests: full lineage from Layer 1
intent_schema+user_promptthrough oraclellm_turnand Layer 3tool_output/tool_errortofinal_response, with observations captured at every boundary and artifacts produced by synthesis reflecting the lineage.
Depends on: component.oracle.gateway, platform.axonis-core, platform.service-contract