Oracle — LLM Gateway and Control Plane

Status: Implemented — oracle/ repo active; /chat, /tools, /services, /memory/*, /register, /graph, /metrics endpoints; OAuthMiddleware, OpenTelemetry tracing, axonis-core memory integration all live. Package: oracle Depends on: platform.axonis-core, platform.service-contract Milestone: P2 (after core + at least one service conforms)

Purpose

Oracle is the preferred external-facing gateway for the Axonis platform. When deployed, it is the only service exposed to clients outside the K8s cluster. Clients inside the cluster (e.g., Beacon) may communicate with Oracle or with backend services directly. See platform.ingress-routing for cluster ingress topology.

Oracle provides:

Authentication and authorization — Keycloak token validation, per-role access control
Tool aggregation — discovers backend services, builds unified tool catalog
Request routing — forwards tool/API calls to the owning backend service
Guardrails — enforces per-client tool access policies
Metering — tracks token usage, tool call counts, latency
LLM assistance — for clients without their own LLM ("dumb clients")
Chat and memory — conversational interface with persistent memory
Monitoring — Prometheus metrics, OpenTelemetry traces

Oracle is NOT an agent. It does not autonomously decide to call tools. In chat mode, it acts as an LLM-powered assistant that uses tools on behalf of the client. In passthrough mode, it is a transparent proxy with controls.

Architecture

Client → Oracle (/agentspace or /api/v1) → Backend Service (internal)

Oracle never exposes backend services directly. Clients see one endpoint with all tools aggregated.

Package Structure

oracle/
  server/
    __init__.py
    __main__.py                  # Starlette: /agentspace, /api/v1, /health, /service-info
    api/
      __init__.py
      routes.py                  # REST: /chat, /tools/{name}, /memory/*, /services
    mcp/
      __init__.py
      server.py                  # Aggregated MCP tools from all backend services
      registry.py                # Service discovery + tool catalog
    middleware/
      __init__.py
      auth.py                    # Keycloak token validation
      guardrails.py              # Per-role tool access control
      metering.py                # Usage tracking, Prometheus metrics
      rate_limit.py              # Per-client, per-tool rate limiting
    llm/
      __init__.py
      router.py                  # Model selection, provider dispatch
      providers/
        __init__.py
        anthropic.py
        openai.py
        groq.py
        ollama.py
      tool_executor.py           # LLM tool-use loop for chat mode
    memory/                      # Thin wrappers — core implementation in axonis-core
      __init__.py
  charts/oracle/                 # Bitnami Helm chart
  .gitlab-ci.yml
  .gitlab-ci-templates/
  Dockerfile
  pyproject.toml

Endpoints

MCP: `/agentspace`

Exposes the aggregated tool catalog from all backend services. When a client calls list_tools(), they see tools from every registered service. When they call a tool, oracle validates auth + guardrails, then forwards to the owning service.

REST: `/api/v1`

Method	Path	Purpose
POST	`/api/v1/chat`	Conversational LLM interface (dumb clients)
POST	`/api/v1/tools/{name}`	Direct tool call (smart clients, no LLM)
GET	`/api/v1/tools`	List all available tools
GET	`/api/v1/services`	List registered backend services
POST	`/api/v1/memory/store`	Store a fact
POST	`/api/v1/memory/recall`	Recall facts
GET	`/api/v1/memory/search`	Semantic search across memory
POST	`/api/v1/register`	Dynamic service registration (push model)
GET	`/api/v1/metrics`	Prometheus metrics

Health: `/health`, `/service-info`

Standard contract per platform.service-contract.

Service Discovery

Oracle discovers backend services via two mechanisms:

Pull model (startup + periodic refresh)

ORACLE_SERVICES = os.getenv("ORACLE_SERVICES", "").split(",")
# e.g., "http://parallax:8003,http://prism:8004,http://cortex:8002,http://unified-dataspace:8009"

On startup and every 60 seconds:

For each URL, call GET /service-info
Call MCP list_tools() on the service's /agentspace
Build/update the unified tool catalog
Health-check: remove unavailable services, re-add when they recover

Push model (optional, for dynamic environments)

Services call POST /api/v1/register on startup with:

{
  "name": "parallax",
  "base_url": "http://parallax:8003",
  "mcp_path": "/agentspace",
  "health_path": "/health",
  "ttl_seconds": 300
}

Oracle stores registrations in Redis. Services re-register on a heartbeat. Missed TTL = removal.

Both mechanisms coexist. Static config for core services, dynamic registration for extensions.

Tool Routing

Oracle maps each tool to its owning service:

# Built automatically from service discovery
TOOL_ROUTES = {
    "fusion_run_start": "http://parallax:8003",
    "fusion_run_status": "http://parallax:8003",
    "execute_lens": "http://prism:8004",
    "route": "http://prism:8004",
    "insight_list": "http://cortex:8002",
    "dataset_list": "http://unified-dataspace:8009",
    ...
}

When a client calls a tool:

Oracle validates the Bearer token
Checks guardrails: does this token's role allow this tool?
Looks up the owning service in TOOL_ROUTES
Forwards the call to the service's MCP endpoint with the original token
Returns the result to the client

Object Routing (REST Gateway)

When the REST gateway is enabled (see platform.ingress-routing), Oracle receives all REST traffic and routes it to the owning service using OBJECT_ROUTES, built from the objects field in each service's /service-info response:

# Built automatically from service discovery — parallel to TOOL_ROUTES
OBJECT_ROUTES = {
    "insight":          ("http://cortex:8002",   "/api/v1"),
    "signal":           ("http://cortex:8002",   "/api/v1"),
    "block":            ("http://cortex:8002",   "/api/v1"),
    "lens":             ("http://parallax:8003", "/api/v2"),
    "fusionrun":        ("http://parallax:8003", "/api/v2"),
    "sensor":           ("http://sentinel:8005", "/api/v1"),
    "alert":            ("http://sentinel:8005", "/api/v1"),
    ...
    # Objects not listed here fall through to fedai-rest
}

When a REST request arrives at Oracle in gateway mode:

Oracle validates the Bearer token
Extracts the object name from the request path:
/api/v1/{object}/... → object name is path segment 3
/userspace/{object}/... → object name is path segment 2
/dataspace/... → always fedai-rest, no object lookup
Checks guardrails: does this token's role allow this HTTP method on this object?
Looks up the owning service in OBJECT_ROUTES (falls back to fedai-rest if not found)
Forwards the request to the owning service at its native path with the original Bearer token
Returns the response to the client

Oracle does not rewrite paths. The path received is forwarded as-is to the target service, which handles both /api/v1/{object}/... and /userspace/{object}/... natively (per platform.service-contract).

Dynamic Ingress Management

Oracle manages Kubernetes Ingress resources dynamically as services register and deregister, using axonis.k8s.IngressManager from axonis-core (see platform.axonis-core).

On service registration:

Oracle reads the objects field from /service-info
Creates or updates an oracle-managed-{service}-rest Ingress resource with path entries for each owned object at both /api/v1/{object} and /userspace/{object} prefixes
If rest_gateway.enabled: false (default), the Ingress routes directly to the service
If rest_gateway.enabled: true, the Ingress routes to Oracle itself (priority 200)

On deregistration or TTL expiry:

Oracle deletes the oracle-managed-{service}-rest Ingress resource
Traffic for that service's objects falls back to static chart-installed Ingress rules

Oracle never modifies or deletes static chart-installed Ingress resources (those without the oracle-managed- prefix).

Guardrails

Per-role access is defined in configuration and applies to both MCP tool calls and REST requests when the REST gateway is enabled. A caller carries a set of roles (Keycloak realm tokens carry multiple roles per user, not one): effective policy is the union of every matched role's allow patterns, and a deny on any matched role denies. REST guardrails match on {method}:{object} patterns:

roles:
  analyst:
    allow:
      # MCP tools
      - "fusion_*"
      - "insight_*"
      - "dataset_list"
      - "dataset_get"
      # REST objects (method:object)
      - "GET:insight"
      - "GET:signal"
      - "POST:insight"
      - "GET:dataset"
    deny:
      - "*_delete"
      - "DELETE:*"
      - "POST:dataset"

  partner:
    allow:
      - "fusion_run_start"
      - "fusion_run_status"
      - "lens_list"
      - "GET:lens"
      - "GET:fusionrun"
    deny:
      - "*"   # deny everything not explicitly allowed

  admin:
    allow: ["*"]
    deny: []

Wildcard matching. Deny takes precedence over allow. When the REST gateway is disabled, REST guardrails are not evaluated (requests never reach Oracle).

Chat Mode (LLM Assistance)

When a client calls POST /api/v1/chat:

{
  "message": "Screen our customer list against the VRS watchlist",
  "conversation_id": "conv_abc",
  "model": "default"
}

Oracle:

Loads conversation history from Redis
Retrieves relevant memory facts
Builds a system prompt with available tools (filtered by caller's role)
Calls the configured LLM provider
If the LLM wants to call tools, oracle executes them against backend services
Returns the LLM's response + any tool results
Stores the conversation turn

The LLM never sees tools the caller doesn't have access to.

Implementation. Oracle owns its tool-use loop — role guardrail checks, HTTP-registry tool dispatch (forwarding each call to the owning backend per §Tool Routing), metering, and Apollo L2 guidance/observation all live in Oracle. LLM completions and streaming go through axonis-core's tool-aware Client (platform.axonis-core); Oracle implements no provider code of its own.

Streaming Chat (optional, planned)

When a client sends POST /api/v1/chat with stream: true and Accept: text/event-stream, Oracle streams the turn back as Server-Sent Events instead of buffering the whole ChatResponse. Auth, guardrails, rate-limiting, and the 503-when-no-LLM check all run before the stream opens (unchanged from the blocking path).

Event protocol (the shared chat-streaming contract — platform.service-contract §Chat Endpoint Pattern):

`event:`	`data`	Meaning
`delta`	`{"text": "..."}`	Incremental assistant text; concatenation of all deltas equals the final `response`.
`tool_call`	`{"id", "name", "service"}`	Oracle is dispatching a backend tool (L3).
`tool_result`	`{"id", "ok": true}` (or `tool_error` `{"id", "ok": false, "detail"}`)	Tool returned / failed.
`done`	the full `ChatResponse` JSON	Terminal success — identical to the non-streaming response (incl. `apollo_guidance`, `tokens`).
`error`	`{"detail", "code"}`	Terminal failure after the stream opened.

Mechanics:

Streaming rides axonis-core's Client.stream() / StreamChunk inside Oracle's own tool-loop: the loop forwards content_deltas outward as delta events and emits tool_call / tool_result around each dispatch. Each turn still resolves to a typed Response, so the loop's branch logic is identical to the non-streaming path.
Tool-call arguments are finalized only on the terminal chunk, so tool_calls in done is always well-formed.
Post-turn work (conversation store, Apollo final_response emit, guidance attach, memory extraction) is re-sequenced to stream close; apollo_guidance rides the terminal done event.
When the resolved provider cannot stream natively, Oracle emits the full response as a single delta then done.
On client disconnect mid-stream, Oracle stops generation and persists the partial assistant turn.

Streaming is opt-in and additive: with stream absent or false, /chat behaves exactly as the blocking contract above.

OODA Phase-Aware Tool Registry

The OODA workflow (Observe → Orient → Decide → Act) is a first-class concept in the aggregated tool registry. Tools declare the phase they belong to, the catalog can be filtered by phase, and the chat-mode system prompt tells the LLM which phase it is operating in. This gives external MCP agents the same workflow guardrails Beacon's UI provides to interactive users, instead of relying on the UI alone to sequence the flow.

Phase Annotation

Every tool surfaced through Oracle's aggregated catalog carries an OODA phase, sourced from the owning backend service's tool metadata (e.g. cortex's @tool(phase=...) registry).

#REQ.phase-metadata — each tool in the aggregated catalog stores its OODA phase alongside its capability and routing metadata. The phase is one of observe, orient, decide, act, or null for cross-cutting tools available in every phase.
#REQ.phase-cross-cutting — cross-cutting tools (e.g. describe_model, list_accessible_data, get_profile_context, report tools) annotate as null and are never filtered out by phase gating.

Phase	Representative tools
Observe	`list_signals`, `get_signal`, `count_signals`, `kpi_batch`, `acknowledge_signal`, `set_signal_disposition`
Orient	`query_data`, `aggregate_data`, `batch_operations`, `create_insight`, `pin_block_to_insight`, `freeze_block`, `add_note`, `suggest_next_blocks`, `memory_*`
Decide	`create_edition`, `freeze_edition_for_attestation`, `attest_edition`, `request_edition_review`, `recommend_decision_template`, `validate_evidence_sufficiency`
Act	`create_task`, `claim_task`, `complete_task`, `escalate_task`, `get_decision_lineage`
Cross-cutting (`null`)	`describe_model`, `list_accessible_data`, `get_profile_context`, `explain_visibility_boundary`, `list_users`, `report_*`

Phase-Gated Filtering

#REQ.phase-filter — when a tools/list request or chat request carries a phase, Oracle returns only that phase's tools plus all cross-cutting (null) tools; phase filtering composes with the capability and per-tool allowlist filters as an intersection.
#REQ.phase-fallback — absent a phase (e.g. freeform chat with no investigation context), no phase filtering is applied: all otherwise-permitted tools remain available. Phase gating never narrows the default behaviour.
#REQ.phase-prompt — in chat mode, when the request/investigation context sets a phase, the system prompt names the active phase and lists the phase-relevant tools with one-line descriptions, so the LLM produces phase-appropriate behaviour (e.g. "You are in ORIENT phase. Gather evidence. Do not create editions or attest decisions.").

Per-Profile Phase Restriction

Profiles (via accountability packs / archetypes resolved upstream) may restrict which OODA phases they participate in. Oracle enforces the restriction during tool filtering:

#REQ.phase-allowed — a profile may declare allowed_phases (e.g. [observe, orient]); Oracle drops tools whose phase is outside that set before returning the catalog or dispatching a call.
#REQ.phase-allowed-default — absent allowed_phases, all phases are permitted (backward-compatible). Default archetype phase access: analyst → observe + orient; principal → observe + orient + decide; commander → all phases.

Phase enforcement is soft-first: system-prompt guidance (§ooda-phase-registry.filtering) precedes hard tool filtering, so power users in freeform sessions are not over-restricted.

Per-Tool Access Gating

Guardrails (§Guardrails) gate by role pattern; capabilities gate groups of tools. Per-tool gating adds a finer layer: a profile can be granted create_block without freeze_block, and every tool is gated — no tool is callable solely because it lacks a capability flag.

Every Tool Is Gated

#REQ.no-ungated-tool — no tool is callable without an explicit access grant. A tool with no capability and no allowlist entry is denied, not implicitly allowed. This closes the gap where capability-less tools (e.g. raw query_data) were reachable by any authenticated session.

Per-Tool Allowlist

A profile carries a tools allowlist — the source of truth for which tools the profile may see and call, enforced at the MCP layer (tools/list and tool dispatch alike), not only inside the LLM loop.

#REQ.tools-allowlist — tools/list returns the intersection of capability-matched tools, the profile's tools allowlist, and (when present) the active phase set; tool dispatch rejects any call to a tool outside the profile's tools allowlist.
#REQ.tools-allowlist-fallback — when a profile declares no tools field, Oracle falls back to capability-only filtering (backward-compatible).
#REQ.tools-allowlist-union — across an archetype inheritance chain, tools allowlists are unioned during resolution, matching the existing llm_tools union behaviour.

Multi-Role Profile Resolution

A caller carries a SET of roles (the Keycloak realm roles on the token), not a single role. Profiles are keyed by role; the caller's effective profile is the union of the tools/llm_tools allowlists of every profile whose key matches one of the caller's roles — the same union-allow model as §Guardrails. So "the profile's allowlist" in the requirements above is the caller's role-union allowlist.

#REQ.profile-role-union — the effective tools (and llm_tools) allowlist is the union across every role-keyed profile matching one of the caller's roles, each itself archetype-union-resolved (#REQ.tools-allowlist-union). The permit rule composes as guardrail(roles) ∩ capability ∩ effective-allowlist. When no matched profile declares a tools field, the capability-only fallback (#REQ.tools-allowlist-fallback) applies.
#REQ.no-role-no-grant — a caller with no resolved role (an empty or metadata-only token) is denied (fail-closed), never an implicit grant.

LLM Allowlist Is a Subset

The llm_tools allowlist (which tools the chat-mode LLM may invoke, §Chat Mode) must never exceed what the profile itself can access.

#REQ.llm-tools-subset — at profile load/compile time Oracle validates llm_tools ⊆ tools. A violation (the LLM granted a tool the profile cannot access) fails loudly with a clear error rather than silently granting the LLM broader access than its human profile.

This gating applies uniformly to MCP clients and the chat-mode LLM; an external agent cannot reach an intelligence tool by connecting to a less-gated surface (see §Workflow Generation Orchestration for the NL→workflow path that also routes through this gating).

Workflow Generation Orchestration

Oracle's chat/LLM control plane accepts a natural-language request and returns an executable workflow graph — a directed sequence of dataset/query/modelling operation nodes — that a frontend can place onto a project sheet and execute. Oracle owns the orchestration contract: it receives the NL request, drives the generation backend, and returns the node graph; it does not itself author the operations. Enhanced/learning-assisted generation is Apollo's role (component.oracle.apollo §workflow-generation-hints).

Contract

#REQ.workflow-gen-request — a workflow-generation request carries the natural-language text plus optional dataset/context references; it is dispatched through the same auth + per-tool gating (§Per-Tool Access Gating) as any other tool call — the generation backend is invoked only with tools the caller's profile permits.
#REQ.workflow-gen-response — the response is a serializable directed graph of operation nodes (each node = a dataset operation with its parameters and edges to predecessors), shaped so a frontend can render it directly onto the project sheet. Oracle returns the graph; it does not execute it unless separately asked.
#REQ.workflow-gen-stateless — generation is stateless per call: Oracle forwards the caller's token and request to the generation backend and retains no per-call generation state beyond standard chat history and metering. Data privacy is preserved because no workflow-generation state is persisted in the generation backend.

Quality Target

#REQ.workflow-gen-quality — the generation capability targets a ≥90% success rate across the acceptance test suite (a generated workflow is "successful" when it is a valid, executable node graph for the request). This target gates the feature's readiness, not individual requests.

LLM Provider Configuration

llm:
  # default_provider selects which provider serves /chat when the request model is "default";
  # set via ORACLE_LLM_DEFAULT_PROVIDER. Production points this at trinity (internal vLLM).
  default_provider: anthropic
  providers:
    anthropic:
      model: claude-sonnet-4-20250514
      api_key: ${ANTHROPIC_API_KEY}
    openai:
      model: gpt-4o
      api_key: ${OPENAI_API_KEY}
    groq:
      model: llama-3.3-70b-versatile
      api_key: ${GROQ_API_KEY}
    ollama:
      base_url: http://ollama:11434/v1
      model: llama3.1:8b
    trinity:                                  # internal vLLM — arcee-ai/Trinity-Large-Thinking; production default
      base_url: http://trinity:8000/v1
      model: arcee-ai/Trinity-Large-Thinking
      api_key: ${TRINITY_API_KEY}

Clients can request a specific model in the chat request. The default is used if not specified. The providers are axonis-core's Client (platform.axonis-core); Oracle holds no provider code of its own.

Metering

Oracle tracks per-client:

Tool call count (by tool name)
LLM token usage (input + output)
Request latency (p50, p95, p99)
Error rate

Exposed as Prometheus metrics at /api/v1/metrics and via OpenTelemetry spans.

Memory

Oracle uses MemoryService (axonis.memory.service) for all memory operations. MemoryService wraps Memory(UDS) + Redis with graceful degradation — see platform.axonis-core for the full contract.

Memory records written by Oracle include service="oracle", and Oracle's recall is strictly scoped to its own service — MemoryService(service="oracle").recall(...) only returns records oracle wrote. This matches the platform-wide rule in platform.axonis-core: every service reads only its own memories. Cross-service knowledge — e.g. "this user prefers concise responses" expressed to beacon should also shape oracle's behaviour — flows through Apollo (component.oracle.apollo), not by Oracle reading another service's memory directly.

from axonis.memory.service import MemoryService

memory = MemoryService(service="oracle")
results = memory.recall(query="...", token=caller_token)

Dependencies

[project]
dependencies = [
    "axonis-core",
    "anthropic>=0.40.0",
    "openai>=1.0.0",
    "httpx>=0.27.0",
    "starlette>=0.36.0",
    "fastapi>=0.110.0",
    "uvicorn[standard]>=0.29.0",
    "redis>=4.0.0",
    "prometheus-client",
]

No ML dependencies. Lightweight.

Invariants

Oracle is the only service exposed directly to external clients. In a full-stack K8s deployment, backend services are ClusterIP only. The agentspace.cluster.local ingress hostname may route to Oracle (preferred, higher priority) or Cortex (fallback, lower priority) depending on which is deployed — see platform.ingress-routing for ingress topology. Direct service hostnames (cortex.cluster.local, etc.) are available to intra-cluster clients.
Oracle never executes domain logic. It routes, authenticates, meters, and optionally reasons via LLM. It never imports parallax, prism, cortex, or fedai-rest code.
The LLM is optional. Oracle works in pure passthrough mode (tool routing only) without any LLM configured. Chat endpoints return 503 if no LLM is configured.
Token passthrough. Oracle forwards the client's original Bearer token to backend services. Backend services re-validate. Oracle does not mint its own tokens.
Service discovery is HTTP-based. No platform-specific discovery mechanism. Works on K8s, Docker Compose, bare metal, or mixed environments.
Guardrails are fail-closed. If a role is not configured, no tools are allowed. If guardrail config is missing, all tools are denied.

Test Expectations

Tool routing tests (mock backend services)
Auth tests (valid token, expired token, wrong role)
Guardrail tests (allow/deny patterns, wildcard matching)
Chat mode tests (mock LLM provider, verify tool dispatch)
Service discovery tests (pull model + push model)
Metering tests (verify counters increment)
Memory tests (MemoryService recall with forwarded token)
Health check and service-info tests

Depends on: component.oracle.apollo, platform.axonis-core, platform.ingress-routing, platform.service-contract

Required by: component.beacon.ticketing, component.conduit.service, component.oracle.apollo, platform.apollo, platform.ingress-routing