Skip to content

Service Configuration and Settings Management

Status: Adopted — AxonisSettings and ServicePorts live in axonis-core >= 4.13.0. Migration of existing services is in progress. Depends on: platform.axonis-core Relates to: platform.service-contract (service anatomy — config file location, Helm chart structure) Milestone: P2 (all services must conform; net-new services must conform from day one)

Purpose

Every Axonis service reads configuration from environment variables. This spec defines the single standard for doing so: a pydantic-settings base class (AxonisSettings) that lives in axonis-core and is extended by each service. It eliminates per-service redeclaration of shared infrastructure fields, enforces startup validation, and makes port assignments visible and conflict-free via a central registry.

This spec applies to both net-new services (follow the pattern from the start) and existing services being migrated (follow the phased migration path).

The AxonisSettings Base Class

AxonisSettings lives at axonis.settings.AxonisSettings. It extends pydantic_settings.BaseSettings and declares all fields shared across two or more Axonis services.

from axonis.settings import AxonisSettings

Shared fields provided by the base class

Server (unprefixed — intentional; see Naming Conventions):

Field Env var Default
host HOST "0.0.0.0"
log_level LOG_LEVEL "INFO"
workers WORKERS os.cpu_count() or 1
debug DEBUG False

SSO / Auth:

Field Env var Default
sso_wellknown SSO_WELLKNOWN "https://sso.axonis.ai/realms/T2S/.well-known/openid-configuration"
sso_client_id SSO_CLIENT_ID "public-clients"
sso_client_secret SSO_CLIENT_SECRET None
sso_token_url SSO_TOKEN_URL "https://sso.axonis.ai/realms/T2S/protocol/openid-connect/token"
sso_verify SSO_VERIFY True
sso_issuer (computed from sso_wellknown)

Elasticsearch:

Field Env var Default
elastic_host ELASTIC_HOST "https://127.0.0.1:9200"
elastic_username ELASTIC_USERNAME None
elastic_password ELASTIC_PASSWORD None
elastic_verify ELASTIC_VERIFY True
elastic_timeout ELASTIC_TIMEOUT 20
elastic_scroll ELASTIC_SCROLL "5m"
elastic_pki_ca ELASTIC_PKI_CA None

Redis:

Field Env var Default
redis_host REDIS_HOST "localhost"
redis_port REDIS_PORT 6379
redis_password REDIS_PASSWORD None
redis_tls REDIS_TLS False

LLM credentials:

Field Env var Default
anthropic_api_key ANTHROPIC_API_KEY None
openai_api_key OPENAI_API_KEY None
groq_api_key GROQ_API_KEY None

OpenTelemetry:

Field Env var Default
otel_enabled OTEL_ENABLED False
otel_service_name OTEL_SERVICE_NAME "axonis"
otel_exporter_otlp_endpoint OTEL_EXPORTER_OTLP_ENDPOINT ""
otel_exporter_otlp_protocol OTEL_EXPORTER_OTLP_PROTOCOL "http/protobuf"

The ServicePorts Registry

All service port assignments live at axonis.ports.ServicePorts. Services must never hardcode port numbers — always reference ServicePorts.<SERVICE>.

from axonis.ports import ServicePorts

# Current assignments
ServicePorts.ORACLE    = 8001
ServicePorts.CORTEX    = 8002
ServicePorts.BEACON    = 8003
ServicePorts.PRISM     = 8004
ServicePorts.PARALLAX  = 8005
ServicePorts.SENTINEL  = 8005  # TODO: conflict — see Known Issues
ServicePorts.CONDUIT   = 8008
ServicePorts.FORGE     = 8010
ServicePorts.GEODEX    = 8011

Naming Conventions

Unprefixed — shared infrastructure fields that belong to the platform, not a specific service. All fields provided by AxonisSettings use unprefixed env vars: HOST, LOG_LEVEL, WORKERS, DEBUG, SSO_*, ELASTIC_*, REDIS_*, OTEL_*, ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY, TRINITY_API_KEY, OLLAMA_API_KEY.

Service-prefixed — fields owned by a specific service. Use {SERVICE}_{FIELD} in uppercase: - PORT is always service-prefixed (e.g. CONDUIT_PORT, BEACON_PORT) because each service has a different default. It is never unprefixed. - All domain-specific fields use the service prefix (e.g. AIRFLOW_BASE_URL, DAG_REPO_URL, ORACLE_TTL_SECONDS).

LLM capability fields — the platform LLM client lives in axonis-core (platform.axonis-core), so its provider knobs are shared, unprefixed config: API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY, TRINITY_API_KEY, OLLAMA_API_KEY) plus the per-provider {PROVIDER}_MODEL and {PROVIDER}_BASE_URL knobs and ORACLE_LLM_DEFAULT_PROVIDER. Per-service selection (which provider/model a given service uses, temperature, etc.) still uses the service prefix via Spec.from_env(prefix="{SERVICE}_LLM") per platform.service-contract.

Net-New Service Pattern

# <service>/server/config.py
from functools import lru_cache
from pydantic import Field
from axonis.settings import AxonisSettings
from axonis.ports import ServicePorts


class Settings(AxonisSettings):
    # Service-specific fields only — shared fields are inherited
    port: int = Field(default=ServicePorts.MY_SERVICE, alias="MY_SERVICE_PORT")
    my_service_domain: str = Field(default="localhost", alias="MY_SERVICE_DOMAIN")


@lru_cache
def get_settings() -> Settings:
    return Settings()

Rules: - Do not redeclare any field already in AxonisSettings. If you need it, it is already there. - port uses a service-prefixed alias; all other server fields (host, log_level, workers, debug) are inherited unprefixed. - otel_service_name should be overridden in the Helm configmap/values.yaml — not in code — so the default "axonis" is replaced at deploy time without touching Python. - get_settings() is the only way to access settings. Do not pass settings as constructor arguments between modules — call get_settings() at the point of use.

Phased Migration

Migrating an existing service requires three phases. Each phase must be a separate MR. Do not combine phases.

The reason: Phase 2 (callsite replacement) touches many files across the service and must be reviewable on its own. Mixing it with config structure changes (Phase 1) or Helm changes (Phase 3) makes the diff unreadable and risks missed callsites.


Phase 1 — Migrate the settings file

Goal: Replace the service's config class with an AxonisSettings subclass.

  1. Read the current config.py (or settings.py) in full.
  2. Create a new Settings(AxonisSettings) subclass:
  3. Remove every field that is now inherited from AxonisSettings (cross-reference the field table above).
  4. Keep only service-specific fields.
  5. Add Field(default=..., alias="ENV_VAR_NAME") to every remaining field if not already present.
  6. Add port field referencing ServicePorts.<SERVICE>.
  7. Replace the module-level singleton (config = Config()) with: python @lru_cache def get_settings() -> Settings: return Settings()
  8. Update all import sites within the service that reference from <service>.config import config — change to from <service>.config import get_settings and replace config.<field> with get_settings().<field>.
  9. Do not touch os.getenv callsites elsewhere in the codebase yet — that is Phase 2.

Phase 1 complete when: The service starts and all existing tests pass. config.py contains only service-specific fields.


Phase 2 — Replace os.environ callsites

Goal: Remove all direct environment variable reads from the service codebase. Every config value must flow through get_settings().

Search for every occurrence of the following patterns across the entire service directory (excluding .venv, __pycache__, and test fixtures that intentionally set env vars):

os\.getenv\(
os\.environ\.get\(
os\.environ\[
os\.environ\.setdefault\(

For each hit: 1. Identify which AxonisSettings field (or service-specific field) covers this env var. 2. Replace with get_settings().<field>. 3. Add from <service>.config import get_settings to the file's imports if not already present. 4. Remove the import os line from the file if os is no longer used for anything else.

If a callsite reads an env var that is not in AxonisSettings and is not yet declared in the service's Settings subclass, add it to the subclass in this phase (do not add to AxonisSettings without discussion — new shared fields require updating all services).

Phase 2 complete when: grep -rn "os\.getenv\|os\.environ" <service>/ --include="*.py" (excluding .venv and __pycache__) returns zero results outside of test fixtures.


Phase 3 — Update the Helm chart

Goal: Align configmap.yaml and values.yaml env var names with the standardised names.

Rename the following env vars in configmap.yaml and values.yaml for each service (only the renames that apply):

Old name Standard name Affected services
CONDUIT_HOST HOST conduit
CONDUIT_PORT CONDUIT_PORT (no change — port stays prefixed)
CONDUIT_WORKERS WORKERS conduit
CONDUIT_LOG_LEVEL LOG_LEVEL conduit
SSO_TLS_VERIFY SSO_VERIFY conduit
BEACON_OTEL_ENABLED OTEL_ENABLED beacon
CORTEX_HOST HOST cortex
CORTEX_PORT CORTEX_PORT (no change — port stays prefixed)
CORTEX_WORKERS WORKERS cortex
CORTEX_LOG_LEVEL LOG_LEVEL cortex
ES_URL ELASTIC_HOST parallax
ES_VERIFY ELASTIC_VERIFY parallax
ES_TIMEOUT ELASTIC_TIMEOUT parallax
PARALLAX_HOST HOST parallax
PARALLAX_PORT PARALLAX_PORT (no change — port stays prefixed)
ORACLE_HOST HOST oracle
ORACLE_PORT ORACLE_PORT (no change — port stays prefixed)
ORACLE_LOG_LEVEL LOG_LEVEL oracle
SENTINEL_HOST HOST sentinel
SENTINEL_PORT SENTINEL_PORT (no change — port stays prefixed)

After renaming: - Verify that every field in Settings (base + subclass) has a corresponding entry in the configmap or is populated via the secrets block. - Verify that no configmap entry references a field that no longer exists in Settings.

Phase 3 complete when: helm template renders without errors and all env var names in the configmap match the field aliases in Settings.

Known Issues

These must be resolved before the affected services complete Phase 3:

  1. Beacon port conflictbeacon/config.py defaults to 8002 (cortex's port); beacon.env sets PORT=8003. ServicePorts.BEACON = 8003 is the intended value. The config.py default must be updated to reference ServicePorts.BEACON in Phase 1.

  2. Parallax / Sentinel port conflict — both are assigned 8005 in ServicePorts. One must be reassigned before either service is deployed alongside the other. Until resolved, both carry a # TODO comment in ports.py. Do not start Phase 3 for either service until this is resolved.

  3. Internal axonis-core callsitesaxonis/apollo/integration.py hardcodes http://oracle:8080/api/v1/apollo (oracle's port is 8001, not 8080). axonis/k8s/kubernetes.py hardcodes "titan" in Kubernetes labels. axonis/middleware/metering.py hardcodes Meter('oracle'). These are axonis-core issues, not service issues, but they should be cleaned up in a dedicated axonis-core MR that replaces hardcoded values with ServicePorts references.

Invariants

  1. No module-level singleton. config = Config() and settings = Settings() at module level are forbidden. Use @lru_cache on get_settings() so instantiation is deferred until first call.

  2. No direct env reads in service code. os.getenv, os.environ.get, and os.environ[ are forbidden in service source files (outside test fixtures). All configuration flows through get_settings().

  3. PORT is always service-prefixed. HOST, LOG_LEVEL, WORKERS, and DEBUG are unprefixed. PORT is always {SERVICE}_PORT because each service has a different default.

  4. Port defaults always reference ServicePorts. Never hardcode an integer port as a Field default. Always use ServicePorts.<SERVICE>.

  5. extra="ignore" must be set. The base class sets this. Do not override it to "forbid" — unknown env vars injected by Kubernetes (from shared configmaps, node metadata, etc.) must not cause startup failures.

  6. Phases must not be combined. Phase 1, 2, and 3 must each be a separate MR. A reviewer cannot meaningfully review a diff that simultaneously restructures the settings class, replaces dozens of os.getenv calls, and renames Helm values.

Test Requirements

  • Every service must have a test that calls get_settings() and asserts key field values are read from env vars. Use monkeypatch.setenv or unittest.mock.patch.dict(os.environ, {...}) and call get_settings.cache_clear() before and after.
  • Test that the service starts (or Settings() instantiates) without error when only required fields are set.
  • Test that unknown env vars do not cause a ValidationError (confirms extra="ignore").
  • No test may call os.getenv to read a field that belongs in Settings — use get_settings() in tests too.

Depends on: platform.axonis-core, platform.service-contract

Required by: component.postern.proxy, platform.devops-cicd, platform.testing