Skip to content

Spec Organization & Tracking Governance

Principle — identity, order, and title are three layers

A spec address must never fuse what it is with where it sits or what it's called. SPEC-CONDUIT-01-SCORING.md fails because it does all three: reorder → renumber, retitle → rename, and every reference breaks. This spec separates them.

Layer What it is Changes when… Stable? Lives in
Identity canonical address of a spec / section / requirement only via deliberate rename yes — load-bearing frontmatter id, heading anchors
Order reading position you reorganize no spec.yaml order: / INDEX.yaml
Title human-readable name you reword no frontmatter title, heading text

Consequences

  • Filenames encode none of identity. Tools resolve a spec by its id through INDEX.yaml, never by path. A file may be renamed or moved freely; only its id is referenced.
  • There are no sequence numbers in identity and no cosmetic display numbers anywhere — the legacy SPEC-PLATFORM-NN numbering is retired, not aliased.
  • No backwards-compatibility layer: when a spec moves or is renamed, every reference is rewritten to the new id in the same change. The corpus is always internally consistent.

Addressing — the three stable ID levels

Spec ID — tier.owner.name

platform.<name>            platform.observability, platform.service-contract
product.<name>             product.lens, product.trained-model
component.<owner>.<name>   component.oracle.apollo, component.parallax.scoring-engine
  • tierplatform | product | component.
  • owner segment is present only for component specs and equals the repo name.
  • name is a stable kebab slug, assigned once. It carries no sequence number and is not the title — the human title: lives in frontmatter and may change without touching the id.
  • The spec ID is the only thing depends_on, realizes, GitLab labels, and code annotations point at.

Grammar (enforced by the linter): ^(platform|product)\.[a-z0-9][a-z0-9-]*$ or ^component\.[a-z0-9][a-z0-9-]*\.[a-z0-9][a-z0-9-]*$.

Section ID — <spec-id>#<section-key>

Every ## and ### heading carries a stable hierarchical key via an HTML-comment anchor:

## Lifecycle               <!-- #lifecycle -->
### Freeze transition      <!-- #lifecycle.freeze -->

Referenced as component.parallax.scoring-engine#lifecycle. Reordering a section moves its subsections (lifecycle.*) with it; the anchor is invariant. Section anchors are mandatory on every ##/### — the linter fails a heading without one.

Requirement ID — <spec-id>#REQ.<key>

Load-bearing clauses should carry a stable requirement ID so tests and code bind to the requirement, not a line number:

- **#REQ.otel-span-per-request** — every inbound request opens exactly one root OTEL span.

Code/test annotation: # spec: [component.parallax.scoring-engine](../parallax-04-scoring-engine/index.md)#REQ.weighted-metric. /sdd-spec-gap maps #REQ.* → code → test. Requirement IDs are encouraged, not mandatory — add them to clauses that warrant traceability.

Structure — frontmatter and file layout

Frontmatter

A single-file spec opens with YAML frontmatter:

---
id: component.parallax.scoring-engine
title: "Scoring Engine"
tier: component
owners: [parallax]
status: complete            # stub | partial | complete | adopted | proposed
depends_on: [platform.service-contract]
realizes: [product.fusion]
gitlab_epic: 64             # set by the projection tool, not by hand
---

A directory-per-spec carries the same fields in spec.yaml, plus an ordered list of section keys:

id: platform.observability
title: "Observability — OpenTelemetry Across Services"
tier: platform
status: partial
order: [purpose, spans, metrics, logs]   # reorder = edit this list; no file renames

File layout

developers-environment/specs/
  INDEX.yaml                      # generated registry (do not hand-edit)
  platform/                       # cross-cutting specs only
    service-contract.md           #   small  -> single file
    observability/                #   large  -> directory-per-spec
      spec.yaml
      10-purpose.md               #   <!-- #purpose -->   NN- = gap-numbered sort hint only
      20-spans.md
  products/                       # cross-service domain objects
    lens.md
    README.md
  process/                        # spec_writing_guide, sdd_process, working notes
  ledgers/                        # cross-repo / hotfix / spec-seam ledgers

<repo>/specs/                     # component specs live with their code
  <name>.md          (single)  |  <name>/spec.yaml + NN-*.md  (directory)
  • A single-file spec's filename is the spec id's last segment, kebab, with no SPEC-NN prefix (component.parallax.scoring-enginescoring-engine.md; product.lenslens.md). The filename is a convenience, never identity — but keeping it equal to the id's tail keeps the tree legible.
  • NN- file prefixes (10-, 20-, 30-) are gap-numbered human sort hints, never identity. Insert at 15- without a cascade; the canonical order is spec.yaml's order:.
  • Large = a platform/product spec with many independently-reorderable sections → directory. Small/focused (most component specs, all of parallax/prism) → single file with inline anchors.
  • Component specs live in <repo>/specs/; dev-env holds only platform.* and product.*.

Index — the generated registry

specs/INDEX.yaml is the single registry of every spec across dev-env and every component repo. It is generated, never hand-edited, by helpful-scripts/build_spec_index.py, which scans frontmatter (single-file) and spec.yaml (directory) across the workspace.

What it contains

generated_by: build_spec_index.py
count: 73
specs:
  - id: component.oracle.apollo
    title: "Apollo — Observation, Learning & Guidance"
    tier: component
    owners: [oracle]
    home: oracle/specs/apollo/spec.yaml
    status: partial
    depends_on: [platform.service-contract, platform.observability]
    realizes: []
    gitlab_epic: 74

home is the path (relative to the workspace root) where the spec lives — the resolver maps id → home. Because identity is the id and never the path, moving a spec only changes home.

How it is used

  • SDD skills resolve any spec id to its home through INDEX.yaml (no hardcoded paths).
  • It is the only input to the GitLab projection (see #gitlab).
  • It is regenerated on pre-commit and validated in CI (--check). A stale or hand-edited INDEX is a lint failure.

Staging & Apply — the spec-change lifecycle completes unattended

Staging exists to solve exactly one problem: a spec whose canonical home (dev-env) is a different repo than the feature MR carrying the change. It is the bridge, not a general ceremony — a spec that lives beside its code never stages.

Scope — what stages, what doesn't

  • Component specs (component.<repo>.*) are edited in place on the feature branch (<repo>/specs/*.md). The MR review gate, atomic landing with the code, and the durable record (MR + history) come for free. No staging dir, no apply entry, ever.
  • Platform/product specs (canonical in dev-env) are staged on the host repo's branch: specs/staging/<spec-change>/ holding the full resulting spec file(s) under specs/ and an apply.json manifest. The Who-&-Why goes in the MR description (a rationale.md is optional for changes that warrant a durable design record); no separate Before/After commentary file — the staged file's diff on the MR is the diff.
  • Multi-repo spec-changes: each satellite repo's branch carries only the specs/staging/<spec-change>/HOST stub — not ceremony, the barrier's merge signal (stub on the default branch == that repo's MR merged). Single-repo changes need no stubs.
  • The host staging dir may also carry transient working artifacts the SDD pipeline reads but the apply tooling ignores: tasks.md (the task plan), issue.json (adopted gap ids), and claims.yml (the spec-change's compiled claim list — written by /sdd-spec-gap, re-checked deterministically by helpful-scripts/sdd_claim_check.py at review). They are session state riding the branch and drop with the staging dir after apply.
  • A spec-change touching no dev-env spec therefore has no staging at all — it's just a branch whose MR edits repo-local specs alongside code.

The canonical dev-env spec is updated by helpful-scripts/sdd_apply_staged.py only after every participating repo's feature MR has merged (the all-merged barrier). The apply is a deterministic, base_blob-guarded full-file copy of content a human already approved on the host MR — the merge approval is the human gate; the apply itself is mechanical and must never depend on a live operator or agent session.

Completion guarantee — event-driven CI apply

  • Every SDD repo's default-branch pipeline runs the sdd-apply CI component (ci-components templates/sdd-apply.yml) whenever specs/staging/*/apply.json exists on the branch. The job runs sdd_apply_staged.py --scan: each staging dir the repo hosts is evaluated against the all-merged barrier; the merge that completes the barrier is, by construction, the pipeline that pushes the assembled spec to dev-env's default branch. Satellites are skipped by design — the host's pipeline owns the apply.
  • Credentials and identity are the platform's existing CI fabric: the group-level GITLAB_TOKEN variable (API probes + dev-env clone/push) and the semantic-release bot identity (GIT_AUTHOR/COMMITTER_EMAIL = $SEMANTIC_RELEASE_BOT_EMAIL). No new secret.
  • The in-session apply remains the fast path. A shipper who observes the final merge applies immediately (/sdd-ship Step 8). The tool is idempotent (no-ops when canonical already equals staged content; a lost push race re-syncs and no-ops), so the session path and CI cannot conflict — whichever runs first wins.
  • A session that enables auto-merge and ends (or an MR approved and merged days later) therefore still results in an applied spec: the merged repo's own pipeline completes it. A merge landing with nobody around no longer strands the staging dir.

Tool invariants

  • Fail loudly on auth. A missing/rejected token is exit-3 fatal, never read as "barrier unmet": the probe must confirm each participating project is visible (HTTP 200 on the project itself) before a 404 on the staging path may mean "not merged". A silent no-op in CI is the failure mode this section exists to kill.
  • Never /tmp. Local manual applies clone dev-env to ~/axonis/.sdd-apply/developers-environment (inside the workspace: correct git identity via the conditional include, correct dirname for generator hooks); CI clones inside its job workspace. The apply commit runs hooks-disabled so environment hooks can never edit the commit.
  • Push race is safe. Two final merges landing near-simultaneously may both see the barrier met; on push rejection the tool re-syncs to origin and re-evaluates — the loser finds the content already applied and no-ops.

Pending-apply detector

helpful-scripts/sdd_pending_applies.py is the discoverability backstop (CI is the executor):

  • Enumerates every SDD repo's default branch for specs/staging/*/apply.json via the GitLab API — branch names are irrelevant, so spec-changes that rode a non-sdd/* carrier branch (or whose branch was deleted post-merge) are found.
  • Classifies each spec-change by reusing the apply tool's own logic (imported, not duplicated): pending (barrier met, canonical ≠ staged), applied (canonical == staged; staging dir is droppable residue), waiting (barrier unmet), stale (base_blob drifted — human reconciles; never auto-applied).
  • Default output is a dry-run report; --apply chains each pending finding into sdd_apply_staged.py.

Discoverability

  • The SDD resume scan ("where are we?") runs the detector alongside the sdd/* branch sweep — a pending spec-change is a first-class resume state, routed straight to the apply.
  • Every MR carrying a staging dir must name the spec-change in its description, regardless of branch name, so humans can trace a merged carrier branch back to its proposal.

GitLab — a generated projection of INDEX.yaml

GitLab is a disposable view of the spec corpus, not a second source of truth. The markdown in git is authoritative; /sdd-epic-sync reconciles GitLab from INDEX.yaml. Nothing about tracking is hand-curated.

What gets projected

  • One epic per spec, all tiers (platform, product, component) — not only the platform specs. The epic title and description are derived from the spec title + status; the epic iid is recorded back into the spec's gitlab_epic field by the sync tool.
  • Scoped labels spec::<id> (e.g. spec::[component.oracle.apollo](../oracle-apollo/index.md)). Scoped labels are mutually exclusive within the spec:: scope and render as a single chip — board-friendly, and able to address every spec. The legacy numeric Spec::01-15 labels are retired.
  • Issues tie to SDD slugs. /sdd-plan opens the issue for a specs/staging/{slug}; /sdd-ship closes it on merge. Each issue is linked to the epic(s) of the spec(s) it touches.

Why it survives reorganization

Every projected artifact keys off the stable spec id. Reordering sections, retitling a spec, or moving its file changes neither the epic nor its labels — only home in the index. The board reflects status: + slug lifecycle, derived, never hand-set.

Publishing — the docs site, a generated projection of the corpus

The spec corpus is served as a browsable documentation site. Like GitLab (#gitlab), the site is a disposable view: built entirely from the markdown in git plus INDEX.yaml, holding no authored content of its own. Anything worth writing lives in the corpus and is projected.

Site shape

  • Engine: MkDocs Material. The site is built and validated by the platform CI pipeline on every commit to main and served by the spec-docs service (#publishing.deployment) — there is no GitLab Pages path (the instance does not serve Pages).
  • Ask-the-corpus chat. Alongside browsing, the service offers an LLM chat grounded in the spec corpus (#publishing.chat) — ask questions about the specs in natural language.
  • Navigation mirrors the index — a section per tier (Platform, Products, Components), one page per registered spec; directory-per-spec specs render as a page group in spec.yaml order:.
  • Start Here comes first. The site's opening section is the SDD tutorial (#publishing.tutorial), before any tier section, and the front page leads with it (#publishing.home) — the site's job is to train as well as reference.
  • Full-text search across the entire corpus.

Front page

The root page (/) is rendered as a newspaper front page — a deliberate editorial metaphor, not a bullet index. Like every other page it holds no authored content: the masthead and motto are fixed chrome, and every headline, byline, and count is derived from INDEX.yaml or git.

  • Masthead + dateline. A fixed masthead ("The Axonis Ledger") and a dateline carrying the build date, a standing motto, and the live spec count — chrome, not corpus content.
  • Lead story = Start Here. The SDD tutorial (#publishing.tutorial) runs above the fold as the front-page lead, so the site's job to train visibly leads its job to reference.
  • Tier columns. One ruled column per tier (Platform, Products, Components), each listing its registered specs with id in ref order — the same projection as the nav, set as newsprint.
  • Latest Builds column. A news rail sourced from the developers-environment git log (recent non-merge commits: date, short sha, subject) — the honest analog of "today's news" for a corpus whose content is specifications, surfacing the latest build/changelog activity. Derived, never hand-authored; absent in a non-git environment.
  • Presentation is a newspaper.css theme scoped to the front page (the derived markdown is emitted as md_in_html islands so .md links still rewrite); interior spec pages keep the standard dark docs theme.

Deployment

  • The site ships as a container image: a multi-stage Dockerfile in developers-environment whose build stage runs the generator + mkdocs build --strict and whose final stage runs the spec-docs service — a Python service (platform.service-contract anatomy) that serves the built static site at / and the chat/MCP surfaces beside it. The image is built and pushed by the platform's shared ci-components pipeline (package-docker) to the project's CI registry path — never a bespoke push job.
  • A helm chart at charts/spec-docs (the ci-components package-helm location contract) templates the Deployment, Service, and Ingress on a dedicated docs hostname (e.g. docs.development.axonis.ai); package-helm publishes it to the project's helm package registry (devel channel on MRs, stable channel on main).
  • The chart is deployed as an ArgoCD Application with values overlaid at deploy time — the platform's only deploy path. The helmfile-baseline repo is not a deploy path; nothing references it.
  • The corpus is read-only through the service: browsing is unauthenticated; the chat surface is authenticated (platform.service-contract authentication) and answers FROM the specs but can never modify one — git remains the only way to change a spec. Conversation history is operational state (ConversationStore), never corpus content.

Generator

  • helpful-scripts/build_spec_site.py assembles the MkDocs source tree from INDEX.yaml — the index is the only input that decides which spec pages are published; an unregistered spec markdown file is never on the site. The Start Here tutorial and the reference docs it links into (#publishing.tutorial) are the sole non-indexed corpus content the generator mounts.
  • Each spec is read from its home. Component specs live in sibling repos: when a sibling checkout is absent (CI builds from a dev-env clone), the generator shallow-clones the owner repo's default branch from GitLab.
  • specs/staging/ dirs, specs/ledgers/, and INDEX.yaml itself are never published — the site shows golden state only, not in-flight proposals or bookkeeping.
  • The generator fails the build on structural integrity errors — a missing home, an empty ref, an order: key with no section file — a red site-build pipeline is a corpus-integrity signal, same class as build_spec_index.py --check.
  • Unresolvable spec id tokens and unknown #fragment keys in body text are collected build warnings (printed with page + token); --strict-refs escalates them to build failure. CI adopts --strict-refs once the corpus's body-text references are clean — strict is the golden state, the warning tier is the migration path, never the destination.

Ask-the-corpus chat

  • The spec-docs service implements the platform chat contract — POST /api/v1/chat per platform.service-contract (chat endpoint pattern) — plus the dual-interface anatomy: /health, /service-info, MCP at /agentspace, REST at /api/v1.
  • Grounded in the corpus, nothing else. The service owns its tool-loop (per platform.axonis-core LLM pattern) over read-only corpus tools — search the built index, fetch a spec page by ref, list specs by tier — so answers cite real spec content addressed by stable ids. The same tools are exposed over MCP for agent callers.
  • LLM via the core client (axonis.llm), configured SPEC_DOCS_LLM_* (platform.service-contract llm-capability). Unconfigured ⇒ HTTP 503 per the chat contract; the site itself never degrades — browse and full-text search stay fully functional.
  • Auth: docs pages, /health, /service-info, and introspection routes are public; the chat REST endpoint and MCP tools authenticate via axonis.auth.authenticator.Authenticator — never an auth-disabled bypass.
  • Browser login (no token paste). The chat widget signs the user in via OIDC Authorization-Code + PKCE (S256) against the platform public Keycloak client (SSO_CLIENT_ID, default public-clients) — never a confidential client, no secret in the browser. The service exposes a public GET /api/v1/auth-config returning only non-secret OIDC parameters (authorization endpoint, token endpoint, client id, scopes) derived from its SSO settings; the widget runs the redirect→code→token exchange itself and attaches the resulting access token as the Bearer on chat calls. The redirect URIs for the docs origin(s) must be allowlisted on the public client (Keycloak config — escalated, not changed by the service).
  • Conversation history follows the chat contract (conversation_id round-trip). The mandated ConversationStore (axonis.memory.conversation) does not exist in axonis-core yet — pre-existing spec-ahead-of-code drift, ledgered; until core ships it the service holds conversation state in-process and swaps to the core store when available.
  • Service config is an AxonisSettings subclass (platform.service-configuration net-new-service pattern); the port is SPEC_DOCS_PORT (default 8012; ServicePorts registry entry tracked as an axonis-core follow-up).

Cross-linking

  • A spec page's URL is keyed to its ref (/specs/<ref>/), never its path; heading anchors (<!-- #key -->) become URL fragments (/specs/<ref>/#<key>). The web address grammar mirrors the id grammar — platform.spec-governance#publishing.linking has exactly one URL.
  • Every spec id token in body text resolves to a hyperlink via the index.
  • depends_on / realizes render as link lists on each spec page, both directions: the generator derives the reverse edges (required by, realized by) from the index, so a product spec shows which components realize it without hand-maintenance.

Start Here tutorial

  • Authored corpus content at specs/process/start-here/ (directory, gap-numbered files) — a narrative walkthrough of the SDD process for a developer new to it: the spec-first model, the three entry shapes, staging and the host/satellite model, the pipeline (plan → spec-gap → task-plan → implement → review → ship → apply), and one worked example followed end-to-end.
  • Tutorial form, not reference form: written to a persona ("you've been handed a ticket"), each page ends with what the reader can now do. specs/process/sdd_process.md and spec_writing_guide.md remain the reference; the tutorial links into them rather than duplicating their content.

Why it survives reorganization

Every published artifact keys off the stable spec id/ref. Moving a file changes only home in the index; retitling changes only display text; reordering sections changes only order:. URLs, nav, and cross-links are all derived — nothing on the site is hand-curated, so nothing on the site can rot.

Validation

helpful-scripts/build_spec_index.py --check is the gate (pre-commit + CI). It fails when:

  • a single-file spec lacks YAML frontmatter, or any spec lacks an id;
  • an id violates the grammar tier.[owner.]name, or tier disagrees with the id prefix;
  • a component.<owner>.* id names an owner that is not a known component repo;
  • an id is non-unique across the whole corpus;
  • a ##/### heading lacks a <!-- #key --> anchor, or a section key repeats within a spec;
  • a depends_on / realizes entry points at an unknown id;
  • status is not one of stub | partial | complete | adopted | proposed;
  • a directory spec's spec.yaml order: disagrees with its section files (forward work).

Run without --check to regenerate specs/INDEX.yaml.

Migration from the legacy scheme

The cutover is one-way and complete — no alias layer, no cosmetic numbers (see #principle). The legacy SPEC-PLATFORM-NN files and repo-local SPEC-NN files are converted in place:

Relocation

  • Per-service platform specs move into their owning repo as component.<owner>.<name>: oracle-gateway, apollo → oracle; alerting → sentinel; conduit, beacon, titan, xanadu, postern → their repos; the per-service sections of the old "service implementations" spec split into parallax / prism / cortex / fedai-rest.
  • Genuinely cross-cutting specs stay in developers-environment/specs/platform/: axonis-core, service-contract, axonis-client, service-configuration, ingress-routing, observability, devops-cicd, and spec-governance (this spec). relay stays at specs/platform/relay.md permanently as a non-normative use-case pointer — it will never have a repo and never becomes component.relay.* (decision 2026-06-10, simplicity#124: device write-back is a use case of the Conduit Effect Engine; normative home component.conduit.effect-engine#relay-write-pattern).
  • All domain-object specs stay in developers-environment/specs/products/.

Reference rewrite

Every reference is rewritten to the new id in the same change: cross-spec links, symbol_to_spec.yml, SDD skills, CLAUDE.md, sdd_apply_staged.py, and code annotations. Because there is no compatibility shim, the linter's unknown-id check is what proves the rewrite is complete.