Oracle — LLM Gateway and Control Plane
Status: Implemented — oracle/ repo active; /chat, /tools, /services, /memory/*, /register, /graph,
/metrics endpoints; OAuthMiddleware, OpenTelemetry tracing, axonis-core memory integration all live.
Package: oracle
Depends on: platform.axonis-core, platform.service-contract
Milestone: P2 (after core + at least one service conforms)
Purpose
Oracle is the preferred external-facing gateway for the Axonis platform. When deployed, it is the only service exposed to clients outside the K8s cluster. Clients inside the cluster (e.g., Beacon) may communicate with Oracle or with backend services directly. See platform.ingress-routing for cluster ingress topology.
Oracle provides:
- Authentication and authorization — Keycloak token validation, per-role access control
- Tool aggregation — discovers backend services, builds unified tool catalog
- Request routing — forwards tool/API calls to the owning backend service
- Guardrails — enforces per-client tool access policies
- Metering — tracks token usage, tool call counts, latency
- LLM assistance — for clients without their own LLM ("dumb clients")
- Chat and memory — conversational interface with persistent memory
- Monitoring — Prometheus metrics, OpenTelemetry traces
Oracle is NOT an agent. It does not autonomously decide to call tools. In chat mode, it acts as an LLM-powered assistant that uses tools on behalf of the client. In passthrough mode, it is a transparent proxy with controls.
Architecture
Client → Oracle (/agentspace or /api/v1) → Backend Service (internal)
Oracle never exposes backend services directly. Clients see one endpoint with all tools aggregated.
Package Structure
oracle/
server/
__init__.py
__main__.py # Starlette: /agentspace, /api/v1, /health, /service-info
api/
__init__.py
routes.py # REST: /chat, /tools/{name}, /memory/*, /services
mcp/
__init__.py
server.py # Aggregated MCP tools from all backend services
registry.py # Service discovery + tool catalog
middleware/
__init__.py
auth.py # Keycloak token validation
guardrails.py # Per-role tool access control
metering.py # Usage tracking, Prometheus metrics
rate_limit.py # Per-client, per-tool rate limiting
llm/
__init__.py
router.py # Model selection, provider dispatch
providers/
__init__.py
anthropic.py
openai.py
groq.py
ollama.py
tool_executor.py # LLM tool-use loop for chat mode
memory/ # Thin wrappers — core implementation in axonis-core
__init__.py
charts/oracle/ # Bitnami Helm chart
.gitlab-ci.yml
.gitlab-ci-templates/
Dockerfile
pyproject.toml
Endpoints
MCP: /agentspace
Exposes the aggregated tool catalog from all backend services. When a client calls list_tools(), they see tools from
every registered service. When they call a tool, oracle validates auth + guardrails, then forwards to the owning
service.
REST: /api/v1
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/chat |
Conversational LLM interface (dumb clients) |
| POST | /api/v1/tools/{name} |
Direct tool call (smart clients, no LLM) |
| GET | /api/v1/tools |
List all available tools |
| GET | /api/v1/services |
List registered backend services |
| POST | /api/v1/memory/store |
Store a fact |
| POST | /api/v1/memory/recall |
Recall facts |
| GET | /api/v1/memory/search |
Semantic search across memory |
| POST | /api/v1/register |
Dynamic service registration (push model) |
| GET | /api/v1/metrics |
Prometheus metrics |
Health: /health, /service-info
Standard contract per platform.service-contract.
Service Discovery
Oracle discovers backend services via two mechanisms:
Pull model (startup + periodic refresh)
ORACLE_SERVICES = os.getenv("ORACLE_SERVICES", "").split(",")
# e.g., "http://parallax:8003,http://prism:8004,http://cortex:8002,http://unified-dataspace:8009"
On startup and every 60 seconds:
- For each URL, call
GET /service-info - Call MCP
list_tools()on the service's/agentspace - Build/update the unified tool catalog
- Health-check: remove unavailable services, re-add when they recover
Push model (optional, for dynamic environments)
Services call POST /api/v1/register on startup with:
{
"name": "parallax",
"base_url": "http://parallax:8003",
"mcp_path": "/agentspace",
"health_path": "/health",
"ttl_seconds": 300
}
Oracle stores registrations in Redis. Services re-register on a heartbeat. Missed TTL = removal.
Both mechanisms coexist. Static config for core services, dynamic registration for extensions.
Tool Routing
Oracle maps each tool to its owning service:
# Built automatically from service discovery
TOOL_ROUTES = {
"fusion_run_start": "http://parallax:8003",
"fusion_run_status": "http://parallax:8003",
"execute_lens": "http://prism:8004",
"route": "http://prism:8004",
"insight_list": "http://cortex:8002",
"dataset_list": "http://unified-dataspace:8009",
...
}
When a client calls a tool:
- Oracle validates the Bearer token
- Checks guardrails: does this token's role allow this tool?
- Looks up the owning service in
TOOL_ROUTES - Forwards the call to the service's MCP endpoint with the original token
- Returns the result to the client
Object Routing (REST Gateway)
When the REST gateway is enabled (see platform.ingress-routing), Oracle receives all REST traffic and routes it to the
owning service using OBJECT_ROUTES, built from the objects field in each service's /service-info response:
# Built automatically from service discovery — parallel to TOOL_ROUTES
OBJECT_ROUTES = {
"insight": ("http://cortex:8002", "/api/v1"),
"signal": ("http://cortex:8002", "/api/v1"),
"block": ("http://cortex:8002", "/api/v1"),
"lens": ("http://parallax:8003", "/api/v2"),
"fusionrun": ("http://parallax:8003", "/api/v2"),
"sensor": ("http://sentinel:8005", "/api/v1"),
"alert": ("http://sentinel:8005", "/api/v1"),
...
# Objects not listed here fall through to fedai-rest
}
When a REST request arrives at Oracle in gateway mode:
- Oracle validates the Bearer token
- Extracts the object name from the request path:
/api/v1/{object}/...→ object name is path segment 3/userspace/{object}/...→ object name is path segment 2/dataspace/...→ always fedai-rest, no object lookup- Checks guardrails: does this token's role allow this HTTP method on this object?
- Looks up the owning service in
OBJECT_ROUTES(falls back to fedai-rest if not found) - Forwards the request to the owning service at its native path with the original Bearer token
- Returns the response to the client
Oracle does not rewrite paths. The path received is forwarded as-is to the target service, which handles both
/api/v1/{object}/... and /userspace/{object}/... natively (per platform.service-contract).
Dynamic Ingress Management
Oracle manages Kubernetes Ingress resources dynamically as services register and deregister, using
axonis.k8s.IngressManager from axonis-core (see platform.axonis-core).
On service registration:
- Oracle reads the
objectsfield from/service-info - Creates or updates an
oracle-managed-{service}-restIngress resource with path entries for each owned object at both/api/v1/{object}and/userspace/{object}prefixes - If
rest_gateway.enabled: false(default), the Ingress routes directly to the service - If
rest_gateway.enabled: true, the Ingress routes to Oracle itself (priority 200)
On deregistration or TTL expiry:
- Oracle deletes the
oracle-managed-{service}-restIngress resource - Traffic for that service's objects falls back to static chart-installed Ingress rules
Oracle never modifies or deletes static chart-installed Ingress resources (those without the
oracle-managed- prefix).
Guardrails
Per-role access is defined in configuration and applies to both MCP tool calls and REST requests when the
REST gateway is enabled. A caller carries a set of roles (Keycloak realm tokens carry multiple roles per
user, not one): effective policy is the union of every matched role's allow patterns, and a deny on any
matched role denies. REST guardrails match on {method}:{object} patterns:
roles:
analyst:
allow:
# MCP tools
- "fusion_*"
- "insight_*"
- "dataset_list"
- "dataset_get"
# REST objects (method:object)
- "GET:insight"
- "GET:signal"
- "POST:insight"
- "GET:dataset"
deny:
- "*_delete"
- "DELETE:*"
- "POST:dataset"
partner:
allow:
- "fusion_run_start"
- "fusion_run_status"
- "lens_list"
- "GET:lens"
- "GET:fusionrun"
deny:
- "*" # deny everything not explicitly allowed
admin:
allow: ["*"]
deny: []
Wildcard matching. Deny takes precedence over allow. When the REST gateway is disabled, REST guardrails are not evaluated (requests never reach Oracle).
Chat Mode (LLM Assistance)
When a client calls POST /api/v1/chat:
{
"message": "Screen our customer list against the VRS watchlist",
"conversation_id": "conv_abc",
"model": "default"
}
Oracle:
- Loads conversation history from Redis
- Retrieves relevant memory facts
- Builds a system prompt with available tools (filtered by caller's role)
- Calls the configured LLM provider
- If the LLM wants to call tools, oracle executes them against backend services
- Returns the LLM's response + any tool results
- Stores the conversation turn
The LLM never sees tools the caller doesn't have access to.
Implementation. Oracle owns its tool-use loop — role guardrail checks, HTTP-registry tool dispatch (forwarding each
call to the owning backend per §Tool Routing), metering, and Apollo L2 guidance/observation all live in Oracle. LLM
completions and streaming go through axonis-core's tool-aware Client (platform.axonis-core); Oracle implements no
provider code of its own.
Streaming Chat (optional, planned)
When a client sends POST /api/v1/chat with stream: true and Accept: text/event-stream, Oracle streams the turn
back as Server-Sent Events instead of buffering the whole ChatResponse. Auth, guardrails, rate-limiting, and the
503-when-no-LLM check all run before the stream opens (unchanged from the blocking path).
Event protocol (the shared chat-streaming contract — platform.service-contract §Chat Endpoint Pattern):
event: |
data |
Meaning |
|---|---|---|
delta |
{"text": "..."} |
Incremental assistant text; concatenation of all deltas equals the final response. |
tool_call |
{"id", "name", "service"} |
Oracle is dispatching a backend tool (L3). |
tool_result |
{"id", "ok": true} (or tool_error {"id", "ok": false, "detail"}) |
Tool returned / failed. |
done |
the full ChatResponse JSON |
Terminal success — identical to the non-streaming response (incl. apollo_guidance, tokens). |
error |
{"detail", "code"} |
Terminal failure after the stream opened. |
Mechanics:
- Streaming rides axonis-core's
Client.stream()/StreamChunkinside Oracle's own tool-loop: the loop forwardscontent_deltas outward asdeltaevents and emitstool_call/tool_resultaround each dispatch. Each turn still resolves to a typedResponse, so the loop's branch logic is identical to the non-streaming path. - Tool-call arguments are finalized only on the terminal chunk, so
tool_callsindoneis always well-formed. - Post-turn work (conversation store, Apollo
final_responseemit, guidance attach, memory extraction) is re-sequenced to stream close;apollo_guidancerides the terminaldoneevent. - When the resolved provider cannot stream natively, Oracle emits the full response as a single
deltathendone. - On client disconnect mid-stream, Oracle stops generation and persists the partial assistant turn.
Streaming is opt-in and additive: with stream absent or false, /chat behaves exactly as the blocking contract above.
OODA Phase-Aware Tool Registry
The OODA workflow (Observe → Orient → Decide → Act) is a first-class concept in the aggregated tool registry. Tools declare the phase they belong to, the catalog can be filtered by phase, and the chat-mode system prompt tells the LLM which phase it is operating in. This gives external MCP agents the same workflow guardrails Beacon's UI provides to interactive users, instead of relying on the UI alone to sequence the flow.
Phase Annotation
Every tool surfaced through Oracle's aggregated catalog carries an OODA phase, sourced from the owning
backend service's tool metadata (e.g. cortex's @tool(phase=...) registry).
- #REQ.phase-metadata — each tool in the aggregated catalog stores its OODA phase alongside its
capability and routing metadata. The phase is one of
observe,orient,decide,act, ornullfor cross-cutting tools available in every phase. - #REQ.phase-cross-cutting — cross-cutting tools (e.g.
describe_model,list_accessible_data,get_profile_context, report tools) annotate asnulland are never filtered out by phase gating.
| Phase | Representative tools |
|---|---|
| Observe | list_signals, get_signal, count_signals, kpi_batch, acknowledge_signal, set_signal_disposition |
| Orient | query_data, aggregate_data, batch_operations, create_insight, pin_block_to_insight, freeze_block, add_note, suggest_next_blocks, memory_* |
| Decide | create_edition, freeze_edition_for_attestation, attest_edition, request_edition_review, recommend_decision_template, validate_evidence_sufficiency |
| Act | create_task, claim_task, complete_task, escalate_task, get_decision_lineage |
Cross-cutting (null) |
describe_model, list_accessible_data, get_profile_context, explain_visibility_boundary, list_users, report_* |
Phase-Gated Filtering
- #REQ.phase-filter — when a
tools/listrequest or chat request carries aphase, Oracle returns only that phase's tools plus all cross-cutting (null) tools; phase filtering composes with the capability and per-tool allowlist filters as an intersection. - #REQ.phase-fallback — absent a
phase(e.g. freeform chat with no investigation context), no phase filtering is applied: all otherwise-permitted tools remain available. Phase gating never narrows the default behaviour. - #REQ.phase-prompt — in chat mode, when the request/investigation context sets a phase, the system prompt names the active phase and lists the phase-relevant tools with one-line descriptions, so the LLM produces phase-appropriate behaviour (e.g. "You are in ORIENT phase. Gather evidence. Do not create editions or attest decisions.").
Per-Profile Phase Restriction
Profiles (via accountability packs / archetypes resolved upstream) may restrict which OODA phases they participate in. Oracle enforces the restriction during tool filtering:
- #REQ.phase-allowed — a profile may declare
allowed_phases(e.g.[observe, orient]); Oracle drops tools whose phase is outside that set before returning the catalog or dispatching a call. - #REQ.phase-allowed-default — absent
allowed_phases, all phases are permitted (backward-compatible). Default archetype phase access:analyst→ observe + orient;principal→ observe + orient + decide;commander→ all phases.
Phase enforcement is soft-first: system-prompt guidance (§ooda-phase-registry.filtering) precedes hard tool filtering, so power users in freeform sessions are not over-restricted.
Per-Tool Access Gating
Guardrails (§Guardrails) gate by role pattern; capabilities gate groups of tools. Per-tool gating adds a
finer layer: a profile can be granted create_block without freeze_block, and every tool is gated
— no tool is callable solely because it lacks a capability flag.
Every Tool Is Gated
- #REQ.no-ungated-tool — no tool is callable without an explicit access grant. A tool with no
capability and no allowlist entry is denied, not implicitly allowed. This closes the gap where
capability-less tools (e.g. raw
query_data) were reachable by any authenticated session.
Per-Tool Allowlist
A profile carries a tools allowlist — the source of truth for which tools the profile may see and call,
enforced at the MCP layer (tools/list and tool dispatch alike), not only inside the LLM loop.
- #REQ.tools-allowlist —
tools/listreturns the intersection of capability-matched tools, the profile'stoolsallowlist, and (when present) the active phase set; tool dispatch rejects any call to a tool outside the profile'stoolsallowlist. - #REQ.tools-allowlist-fallback — when a profile declares no
toolsfield, Oracle falls back to capability-only filtering (backward-compatible). - #REQ.tools-allowlist-union — across an archetype inheritance chain,
toolsallowlists are unioned during resolution, matching the existingllm_toolsunion behaviour.
Multi-Role Profile Resolution
A caller carries a SET of roles (the Keycloak realm roles on the token), not a single role. Profiles are
keyed by role; the caller's effective profile is the union of the tools/llm_tools allowlists of
every profile whose key matches one of the caller's roles — the same union-allow model as §Guardrails. So
"the profile's allowlist" in the requirements above is the caller's role-union allowlist.
- #REQ.profile-role-union — the effective
tools(andllm_tools) allowlist is the union across every role-keyed profile matching one of the caller's roles, each itself archetype-union-resolved (#REQ.tools-allowlist-union). The permit rule composes as guardrail(roles) ∩ capability ∩ effective-allowlist. When no matched profile declares atoolsfield, the capability-only fallback (#REQ.tools-allowlist-fallback) applies. - #REQ.no-role-no-grant — a caller with no resolved role (an empty or metadata-only token) is denied (fail-closed), never an implicit grant.
LLM Allowlist Is a Subset
The llm_tools allowlist (which tools the chat-mode LLM may invoke, §Chat Mode) must never exceed what
the profile itself can access.
- #REQ.llm-tools-subset — at profile load/compile time Oracle validates
llm_tools ⊆ tools. A violation (the LLM granted a tool the profile cannot access) fails loudly with a clear error rather than silently granting the LLM broader access than its human profile.
This gating applies uniformly to MCP clients and the chat-mode LLM; an external agent cannot reach an intelligence tool by connecting to a less-gated surface (see §Workflow Generation Orchestration for the NL→workflow path that also routes through this gating).
Workflow Generation Orchestration
Oracle's chat/LLM control plane accepts a natural-language request and returns an executable workflow graph — a directed sequence of dataset/query/modelling operation nodes — that a frontend can place onto a project sheet and execute. Oracle owns the orchestration contract: it receives the NL request, drives the generation backend, and returns the node graph; it does not itself author the operations. Enhanced/learning-assisted generation is Apollo's role (component.oracle.apollo §workflow-generation-hints).
Contract
- #REQ.workflow-gen-request — a workflow-generation request carries the natural-language text plus optional dataset/context references; it is dispatched through the same auth + per-tool gating (§Per-Tool Access Gating) as any other tool call — the generation backend is invoked only with tools the caller's profile permits.
- #REQ.workflow-gen-response — the response is a serializable directed graph of operation nodes (each node = a dataset operation with its parameters and edges to predecessors), shaped so a frontend can render it directly onto the project sheet. Oracle returns the graph; it does not execute it unless separately asked.
- #REQ.workflow-gen-stateless — generation is stateless per call: Oracle forwards the caller's token and request to the generation backend and retains no per-call generation state beyond standard chat history and metering. Data privacy is preserved because no workflow-generation state is persisted in the generation backend.
Quality Target
- #REQ.workflow-gen-quality — the generation capability targets a ≥90% success rate across the acceptance test suite (a generated workflow is "successful" when it is a valid, executable node graph for the request). This target gates the feature's readiness, not individual requests.
LLM Provider Configuration
llm:
# default_provider selects which provider serves /chat when the request model is "default";
# set via ORACLE_LLM_DEFAULT_PROVIDER. Production points this at trinity (internal vLLM).
default_provider: anthropic
providers:
anthropic:
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}
openai:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
groq:
model: llama-3.3-70b-versatile
api_key: ${GROQ_API_KEY}
ollama:
base_url: http://ollama:11434/v1
model: llama3.1:8b
trinity: # internal vLLM — arcee-ai/Trinity-Large-Thinking; production default
base_url: http://trinity:8000/v1
model: arcee-ai/Trinity-Large-Thinking
api_key: ${TRINITY_API_KEY}
Clients can request a specific model in the chat request. The default is used if not specified. The providers are
axonis-core's Client (platform.axonis-core); Oracle holds no provider code of its own.
Metering
Oracle tracks per-client:
- Tool call count (by tool name)
- LLM token usage (input + output)
- Request latency (p50, p95, p99)
- Error rate
Exposed as Prometheus metrics at /api/v1/metrics and via OpenTelemetry spans.
Memory
Oracle uses MemoryService (axonis.memory.service) for all memory operations. MemoryService wraps Memory(UDS) + Redis
with graceful degradation — see platform.axonis-core for the full contract.
Memory records written by Oracle include service="oracle", and Oracle's recall is strictly scoped to its own
service — MemoryService(service="oracle").recall(...) only returns records oracle wrote. This matches the
platform-wide rule in platform.axonis-core: every service reads only its own memories. Cross-service knowledge — e.g.
"this user prefers concise responses" expressed to beacon should also shape oracle's behaviour — flows through
Apollo (component.oracle.apollo), not by Oracle reading another service's memory directly.
from axonis.memory.service import MemoryService
memory = MemoryService(service="oracle")
results = memory.recall(query="...", token=caller_token)
Dependencies
[project]
dependencies = [
"axonis-core",
"anthropic>=0.40.0",
"openai>=1.0.0",
"httpx>=0.27.0",
"starlette>=0.36.0",
"fastapi>=0.110.0",
"uvicorn[standard]>=0.29.0",
"redis>=4.0.0",
"prometheus-client",
]
No ML dependencies. Lightweight.
Invariants
- Oracle is the only service exposed directly to external clients. In a full-stack K8s deployment, backend services
are ClusterIP only. The
agentspace.cluster.localingress hostname may route to Oracle (preferred, higher priority) or Cortex (fallback, lower priority) depending on which is deployed — see platform.ingress-routing for ingress topology. Direct service hostnames (cortex.cluster.local, etc.) are available to intra-cluster clients. - Oracle never executes domain logic. It routes, authenticates, meters, and optionally reasons via LLM. It never imports parallax, prism, cortex, or fedai-rest code.
- The LLM is optional. Oracle works in pure passthrough mode (tool routing only) without any LLM configured. Chat endpoints return 503 if no LLM is configured.
- Token passthrough. Oracle forwards the client's original Bearer token to backend services. Backend services re-validate. Oracle does not mint its own tokens.
- Service discovery is HTTP-based. No platform-specific discovery mechanism. Works on K8s, Docker Compose, bare metal, or mixed environments.
- Guardrails are fail-closed. If a role is not configured, no tools are allowed. If guardrail config is missing, all tools are denied.
Test Expectations
- Tool routing tests (mock backend services)
- Auth tests (valid token, expired token, wrong role)
- Guardrail tests (allow/deny patterns, wildcard matching)
- Chat mode tests (mock LLM provider, verify tool dispatch)
- Service discovery tests (pull model + push model)
- Metering tests (verify counters increment)
- Memory tests (MemoryService recall with forwarded token)
- Health check and service-info tests
Depends on: component.oracle.apollo, platform.axonis-core, platform.ingress-routing, platform.service-contract
Required by: component.beacon.ticketing, component.conduit.service, component.oracle.apollo, platform.apollo, platform.ingress-routing