Coordination Ledger Options for Composable Architecture
Status & scope
- Status: DRAFT — needs design completion (advisory options analysis; no decision selected yet)
- Layer: cross-engine architecture
- Date: 2026-04-30
- Companion specs: fusion-adi-integration, wire-message-families
- Purpose: Frame the drift / audit / state-discipline problem and lay out the options for a coordination layer. This document presents alternatives with tradeoffs; it does not select among them.
Executive summary
Composable architectures pay a recurring audit-quality tax: silent schema drift between services, diffuse audit trails that make cross-service workflow reconstruction forensic rather than mechanical, and undefined workflow states that emerge from message timing instead of declared invariants. This document frames the problem, surveys the modern coordination/audit frameworks (Temporal, Restate, Confluent + Schema Registry, NATS JetStream, EventStoreDB, immudb, AWS QLDB, and others), and lays out four viable paths for the coordination layer — each presented with honest tradeoffs against Axonis-specific constraints (federation invariants, edge deployability, Python-first, existing UDS/RabbitMQ infrastructure). It does not recommend a path. §6 is the decision-criteria table; §9 is the open questions; §7b is a worked end-to-end example showing how all the primitives compose. Reading time: 15–20 minutes; decision conversation: 30–45 minutes.
1. Problem statement
A composable service architecture optimises for service-team independence. The cost is three system-level pathologies that emerge as the system grows:
| Pathology | Definition | How it surfaces |
|---|---|---|
| Schema drift | Two services interpret the same concept differently as either evolves | Silent contract breaks; consumers fail in production after a producer update |
| Audit diffusion | No single ordered record of what happened across services | Cross-service workflow reconstruction is forensic, not mechanical |
| Undefined workflow states | Cross-service workflows reach states the system never declared | Operational off-the-rails; no enforceable invariants between services |
Constraints any solution must respect for Axonis specifically:
- Federation invariants — data stays at the source; cross-organisational audit is via signed evidence, not shared infrastructure
- Edge deployability — must be runnable on a Lookup-Light Edge Node (CL-04 Profile 1)
- Existing infrastructure — Xanadu/RabbitMQ as the federation transport, UDS/Elastic as the search-shaped object store
- Python-first — services are predominantly Python; non-Python additions create operational asymmetry
2. Workflow / state-machine engines surveyed
| System | Schema enforcement | Audit log | State machine | Federation | Footprint | Notes |
|---|---|---|---|---|---|---|
| Temporal | partial (SDK-typed activities) | strong (workflow history) | strong (workflows ARE state machines) | single-cluster | heavy (Cassandra/PG backend, broker tier) | Production at Uber, Snap, DoorDash, Coinbase. Workflow code IS the spec. |
| Restate | strong (typed handlers) | strong (event-sourced) | strong (durable execution) | single-cluster | medium (single binary) | Newer (2024 GA), opinionated, well-engineered. |
| Camunda 8 / Zeebe | medium (BPMN typing) | strong | strong (BPMN-modelled) | single-cluster | heavy (JVM, brokers) | Mature; visually-designed workflows. |
| AWS Step Functions | medium (JSON Schema) | strong | strong | none — AWS-only | managed | Excellent in AWS; vendor lock. |
| Apache Airflow / Prefect | weak | medium | DAG (not state machine) | single-cluster | medium-heavy | Wrong shape — batch DAGs, not transactional workflow. |
| Akka / Microsoft Orleans | weak | weak | actor mailboxes | yes (cluster sharding) | medium | Different paradigm — message-passing actors, not coordination. |
| Dapr Workflow | partial (Temporal under the hood) | strong | strong | partial (sidecar pattern crosses boundaries) | medium (Dapr runtime per service) | Modular building blocks; brings Dapr surface as a whole. |
3. Append-only / ledger backing stores surveyed
| System | Native append-only | Cryptographic verification | Schema enforcement | Python ecosystem | Federation fit | Footprint |
|---|---|---|---|---|---|---|
| ElasticSearch (current UDS choice) | by convention | none | optional (mappings) | strong | partial (cross-cluster replication) | heavy (JVM, sharded cluster) |
| AWS QLDB | yes, native | strong (Merkle-tree, signed) | weak (Amazon Ion) | medium | none — AWS-only | managed |
| immudb | yes, native | strong (Merkle, key-value-style) | weak | strong (Python client) | partial | small (single Go binary) |
| EventStoreDB | yes, native | weak | medium | medium | weak | medium (.NET on Linux) |
| Trillian / Sigstore Rekor | yes, native | strong (transparency log; powers Certificate Transparency) | weak | medium | strong (purpose-built for distributed audit) | medium |
| PostgreSQL insert-only + triggers | by convention | by convention (hash chain via trigger) | strong (typed cols + JSONB) | very strong | partial | medium |
| SQLite + WAL, insert-only | by convention | by convention (hash chain via library) | strong (typed cols + JSON) | trivial — stdlib | per-node, replicate via outbox | tiny — single file |
| Plain JSONL with hash chain | yes, by construction | strong (one-line-per-record hash chain in pure Python) | enforced by Python validator | trivial | trivial — file replication | none beyond filesystem |
| Confluent Kafka + Schema Registry | yes (compacted topics ≠ append-only; raw topics yes) | weak | very strong | strong | medium (MirrorMaker) | heavy (KRaft/Zookeeper, brokers, registry) |
| NATS JetStream + KV | yes | weak | optional | strong | strong (leaf nodes for federation) | small |
| TigerBeetle | yes | strong (financial ledger) | very narrow (financial transfer schema only) | partial | weak | small but inflexible |
4. Schema registry options surveyed
| System | Maturity | Python | Drift enforcement | Coupling |
|---|---|---|---|---|
| Confluent Schema Registry | very mature | yes | strong | tight to Kafka |
| Apicurio | mature | yes | strong | broker-agnostic |
| AWS Glue Schema Registry | mature | yes | strong | AWS-only |
In-tree JSON Schema files + jsonschema library |
trivial | trivial | strong (validate at write time) | none — pure Python |
5. Three paths forward, each viable
The surveys reveal that no single product solves all three pathologies for federated systems. Three distinct paths are coherent — each makes different tradeoffs.
Path A — Pure-Python primitives in axonis-core (no new services)
Shape
Three libraries added to axonis-core:
- Schema registry as a directory of versioned JSON Schema files plus a Python validator
- Hash-chained coordination ledger backed by JSONL or SQLite (configurable)
- Declarative state machine library (data-defined; state derives from ledger)
All pure Python; ~1,000 lines per primitive; no external service dependencies; runs on any Python 3.11+ environment.
Tradeoffs
- ✅ Zero operational additions; no Kafka/Temporal/Confluent stack
- ✅ Edge-deployable — runs on Lookup-Light Edge Nodes
- ✅ Federation-native — each participant runs its own ledger; cross-participant replication via existing RabbitMQ outbox
- ✅ All artefacts are JSON / JSONL — auditable with
catandjq - ⚠️ Maturity is what we ship — no community ecosystem to lean on
- ⚠️ Ledger throughput is bounded by single-file or single-SQLite-connection; high-volume nodes may need rotation strategy
- ⚠️ State-machine library is in-house — semantic precedents (BPMN, statecharts) require reimplementation
- ⚠️ Audit-quality only as good as the disciplined-write pattern (writers must use the library; bypass is possible)
Implementation surface
- Phase 1 (week 1-2): schema registry + 6-8 starter coordination schemas
- Phase 2 (week 2-3): hash-chained ledger with JSONL + SQLite backends
- Phase 3 (week 3-4): state-machine library + reference workflow
- Phase 4 (week 4-6): wire to existing VRS-screening pipeline; ledger-driven replay
- Phase 5 (week 6+): cross-node replication via RabbitMQ outbox
Path B — Adopt Temporal (or Restate) + Confluent Schema Registry
Shape
Run Temporal Cluster (workflow + audit) alongside existing services. Run Confluent Schema Registry alongside RabbitMQ for typed event contracts. Services adopt Temporal SDK for workflows and use Schema Registry for cross-service event contracts.
Tradeoffs
- ✅ Battle-tested at scale — Uber, Snap, DoorDash use Temporal; Confluent is the de facto schema-registry standard
- ✅ Mature SDKs in many languages — Temporal's Python SDK is well-engineered
- ✅ Workflow-as-code semantics are well-understood; less in-house design risk
- ⚠️ Operational footprint — Temporal requires Cassandra or PostgreSQL backend, broker tier, frontend service; Confluent requires registry deployment
- ⚠️ Federation gap — Temporal is single-cluster by design; cross-cluster federation requires running multiple Temporal Clusters with separate workflow namespaces (no native cross-cluster workflow concept)
- ⚠️ Edge deployment problematic — Temporal Cluster is far too heavy for a Lookup-Light Edge Node
- ⚠️ Two new services to operate; new SLAs to maintain
- ⚠️ Vendor velocity matters — Temporal is well-funded; Restate is younger
Variants
- B1: Temporal + Confluent (heavyweight, mature)
- B2: Restate + Apicurio (lighter, younger, less proven)
- B3: Step Functions + Glue Registry (managed; AWS-only; vendor lock)
Path C — NATS JetStream as coordination substrate, schema files in-tree
Shape
Adopt NATS JetStream as the coordination layer alongside existing RabbitMQ. NATS subjects act as schema namespaces; KV bucket holds composition state. Schema discipline via in-tree JSON Schema files validated at publish time. State machine library similar to Path A.
Tradeoffs
- ✅ Federation-native — leaf nodes designed for organisation-spanning topologies
- ✅ Lighter footprint than Kafka — single Go binary; well-instrumented
- ✅ Combines pub/sub + work queues + KV in one runtime; closest to a "Linda tuple space" feel
- ⚠️ Two messaging substrates to operate — RabbitMQ (Xanadu) AND NATS, OR migrate Xanadu off RabbitMQ
- ⚠️ NATS doesn't ship a workflow primitive — state machine still in-house
- ⚠️ Cryptographic chain for audit is not native; would need to be added
- ⚠️ Operational learning curve — NATS conventions differ from RabbitMQ
Path D — Layer on existing UDS/Elastic with disciplined append-only conventions
Shape
Treat the existing UDS as the coordination ledger. Add per-event signed-hash chaining as a UDS object property. Schema registry as files in axonis-core. State machine library similar to Path A.
Tradeoffs
- ✅ Zero new infrastructure — uses what's already deployed
- ✅ Searchable — Elastic indices are query-rich
- ⚠️ Elastic is not append-only by construction — append discipline is convention, not enforcement
- ⚠️ Tamper detection requires post-hoc verification; cannot prevent tampering at the storage layer
- ⚠️ Heavy footprint — Elastic stays heavy; not Edge-deployable
- ⚠️ Federation across organisations via Elastic cross-cluster replication is operationally complex
6. Decision criteria
Questions whose answers narrow the choice:
| Question | If answer is X, prefer | If answer is Y, prefer |
|---|---|---|
| Must coordination work on a 100MB Edge Node? | Path A or C | Path B excluded |
| Must we operate the coordination layer ourselves? | Path A (least op cost) | Path D OK; B costly |
| Is federation across organisations a load-bearing requirement? | Path A or C | Path B partial; D weak |
| Do we need named workflow primitives that already exist? | Path B (Temporal) | Path A (build small library) |
| What's our tolerance for new services? | Path A or D | Path B (2-3 new services) |
| What's our tolerance for in-house code? | Path B (least in-house) | Path A (most in-house) |
| Is rapid time-to-first-running-system the priority? | Path A (4-6 weeks pure Python) | Path B (similar wall-clock but heavier ops) |
| Is mature community ecosystem important? | Path B | Path A (we ship the maturity) |
| Is Elastic-the-ledger durable enough for regulator-grade audit? | If yes, Path D | If no, Paths A, B, C |
7. Cross-cutting design points (apply to whichever path)
These elements are required regardless of which path is chosen.
7.1 Schema versioning rule
Convention namespace.entity.vN with rules:
- Backward-compatible (additive) changes increment minor version
- Breaking changes require new major version + migration plan
- The validator refuses to write events whose schema_id is not in the registry
7.2 Causal predecessor field
Every cross-service event carries a causal_predecessor field — the event ID this one logically follows. Enables replay and audit reconstruction regardless of which storage path is chosen.
7.3 Composition ID
A workflow instance has a unique composition_id. All events for that workflow carry it. The state machine queries the ledger by composition_id to derive current state.
7.4 Replay semantics
The audit/replay claim requires that re-running events in their stored order yields identical state. This is structural in Paths A, B, C; conventional in Path D.
7.5 Federation outbox
Whichever path is chosen, cross-organisational replication is an outbox pattern over Xanadu/RabbitMQ. The producing node packages event slices (with verification metadata); the receiving node imports and verifies. The path choice affects what verification is possible.
7a. Authorization, identity, and lease primitives
The coordination ledger composes with — does not replace — Axonis's existing authorization model. Per Invariant 1, UDS is the sole ABAC authority. The coordination layer records what was attempted, who attempted it, and what UDS decided; it does not make authorization decisions.
This section describes how four orthogonal concerns (agents, leasing, JWT-bearing actions, ABAC/RBAC) fit on top of any of the four paths.
7a.1 Agents as first-class actors
Agents (LLM-driven workers, autonomous task workers, or any non-human actor) emit coordination events identically to services. The schema registry includes a base coordination.actor.v1 schema describing actor metadata, which agent-specific schemas extend.
Required actor fields on every coordination event:
actor.type : enum(service, agent, human)
actor.id : str (canonical identifier)
actor.version : semver (for replay determinism)
Agent-specific extension fields:
agent.model_id : str (e.g. "claude-opus-4-7")
agent.prompt_hash : SHA-256 of the prompt template + context
agent.temperature : float
agent.seed : int (when set; null otherwise)
agent.response_hash : SHA-256 of the agent's output
agent.tool_call_chain : list of tool invocations within the action
Replay determinism for agent events: the ledger record includes agent.response_hash. On replay, the cached response is re-used; the model is not re-invoked. This preserves replay determinism without requiring agent runtimes to be deterministic.
Invariant 6 alignment ("AI assists, humans attest"): the state machine declares source constraints on transitions. Transitions to terminal states (attested, closed) are restricted to actor.type: human. Agents may propose; only humans freeze.
7a.2 Leasing — explicit lifecycle as state-machine transitions
Leasing maps directly to ledger-recorded state transitions. No separate lease store; current lease state is derived from the ledger.
Standard lease lifecycle:
States: unclaimed → leased → in_progress → complete
↓
lease_expired → unclaimed (retry)
Transitions:
unclaimed → leased
on_event: coordination.lease_claimed.v1
payload: { lease_holder, jwt_jti, lease_window_seconds, claimed_at }
leased → in_progress
on_event: coordination.work_started.v1
leased → lease_expired
on_event: coordination.lease_expired.v1
triggered: by timer service when (claimed_at + window) < now
AND no work_completed event observed
lease_expired → unclaimed
on_event: coordination.lease_released.v1
Properties: - Lease state derives from the ledger — no separate lease table - Lease expiry is itself a recorded event (durable, audit-quality) - Concurrent-claim prevention via state machine refusal
Path-specific lease implementation: - Path A (pure Python): small timer worker reads ledger, emits expiry events (~100 LOC) - Path B (Temporal/Restate): native — Temporal timer activities are the canonical lease mechanism - Path C (NATS JetStream): KV bucket TTL provides the timer natively - Path D (UDS/Elastic): scheduled job sweeps for stale leases
7a.3 JWT-bearing actions
Coordination events carry identity context as a required header field, enforced by schema registry:
{
"schema_id": "coordination.lease_claimed.v1",
"actor": {
"type": "agent",
"id": "agent_vrs_screener_01",
"version": "1.4.2"
},
"auth": {
"jwt_jti": "abc123...",
"subject": "user_smith@firm.example",
"scopes": ["fusion_operator", "lens_authoring"],
"issuer": "idp.firm.example",
"issued_at": "2026-04-30T10:00:00Z",
"expires_at": "2026-04-30T11:00:00Z",
"delegation_chain": ["service_acct_axonis", "user_smith@firm.example"]
},
"payload": { ... }
}
What this gives: - Audit-grade record of who acted under which credential at what time - Replay can verify the JWT was valid at the original time, not at replay time (since tokens expire) - Repudiation defence: chained hash + signed JWT-issuance metadata makes "I didn't do that" claims structurally verifiable
JWT verification location: at the service boundary, not at the coordination-layer write. The ledger records the JWT context; it does not validate the JWT signature itself. Validation happens once at the service that consumes the request; the verified claims are then carried in the event.
7a.4 ABAC and RBAC integration
The coordination ledger does not enforce authorization. UDS does (Invariant 1). The flow:
1. Actor presents request + JWT to a service
2. Service verifies JWT signature and basic claims
3. Service calls UDS for ABAC evaluation
- subject attributes (from JWT claims + UDS profile lookup)
- requested action (from event schema_id)
- resource attributes (from event payload)
- environmental attributes (time, geographic, classification level)
4a. UDS returns ALLOW → service emits the coordination event; ledger records it
4b. UDS returns DENY → service emits coordination.access_denied.v1 → ledger records the denial
5. State machine validates the event against current state; advances or rejects
Two patterns supported:
Deny-by-omission — when ABAC denies and audit of denials is not regulator-required, no domain event is emitted. The ledger only records denials when explicit auditing is required.
Authorization-as-context — the auth block in each event lets downstream services re-verify claims without re-querying UDS. Reduces UDS load; verifies independently.
Schema-side enforcement of authorization context:
- Each schema declares required_scopes in registry metadata (not event payload)
- Validator at write time checks the actor's JWT contains required scopes
- Mismatch is wire-layer rejection (drift prevention extends to access control)
7a.5 SSO topology — single vs federated
Axonis deployments use one of two SSO topologies. The coordination layer supports both, with no path-specific differences.
Single SSO (one identity provider issues all JWTs):
- All services share one trusted issuer (customer's Okta, Azure AD, Auth0, or Axonis-hosted IdP)
- JWT verification config is uniform across services
- Cross-service claims propagate without re-issuance — original JWT is forwarded in event headers
- Long-running workflows: token expiry is handled by either (a) refresh against the same IdP, or (b) workflow pause until human re-auth
- This is the typical configuration for single-firm deployments (Citi alone, Disney alone)
Federated SSO (multiple IdPs across organisational boundaries):
- Each participating organisation runs its own IdP
- Trust is established between IdPs via one of:
- OIDC federation (one IdP trusts another's tokens directly)
- SAML federation
- mTLS-bound JWTs (the certificate binds the JWT to the issuer infrastructure)
- Per-edge issuance (federation hub issues short-lived JWTs scoped to a single composition)
- Coordination event
auth.issuerfield disambiguates which IdP issued the JWT - Cross-org events go through the federation outbox (§7.5); receiver verifies the JWT signature against the trusted IdP's public key
- Long-running cross-org workflows: each organisation's portion uses its own IdP's JWT; the workflow ledger records the chain
- This is the typical configuration for multi-firm scenarios (VRS + regulated firm; Disney + Hulu + ESPN; defence partners)
JWT freshness during cross-org workflows:
- Each organisation's contribution to a composition uses a JWT issued by that organisation's IdP
- The composition's auth.delegation_chain records the originating user + each acting service account
- If a JWT expires mid-workflow, the next event in that org's portion requires either re-auth (human-driven) or a refreshed service-account JWT (system-driven). Either way the transition is recorded.
7a.6 What the coordination layer does and does not do for authorization
Does: - Record full identity context on every event - Enforce required scopes via schema registry (drift prevention) - Preserve replay-quality audit including who-did-what-under-which-credential - Support both single and federated SSO without path-specific changes
Does not: - Validate JWT signatures (service boundary's job) - Make ABAC decisions (UDS's job per Invariant 1) - Establish cross-organisational trust (IdP federation configuration) - Refresh tokens or manage credential lifecycle (auth-domain responsibility) - Replace RBAC primitives (RBAC layered on top of ABAC at the service tier)
7a.7 Open questions specific to authorization integration
These are in addition to §9's path-selection questions:
- Single SSO or federated SSO for the first reference implementation? (Determines whether
auth.issuerfield disambiguation is exercised in P0.) - Are denied-access events themselves required to be auditable, or is deny-by-omission acceptable? (Affects whether
coordination.access_denied.v1is emitted by default.) - Is the
auth.delegation_chainfield load-bearing for any current customer scenario, or is it future-proofing? (Affects required-vs-optional in the schema.) - Token-refresh strategy for long workflows: human re-auth, service-account substitution, or pause-and-resume? (Doesn't affect path choice, but affects state-machine design.)
- Capability-based delegation (narrow JWT scopes for agent actions) — required now or later? (Compatible with all paths; design decision.)
7b. Worked example — agent-driven VRS screening with federated SSO
This walks through one composition end-to-end, showing how the primitives compose. Setting: Firm A (Citi UK) is screening a customer against the VRS register. Federated SSO — Citi's IdP issues human user JWTs; VRS Ltd's IdP issues participant tokens. An agent (agent_vrs_screener_01) drives the screening lifecycle. Composition state advances through 6 declared states; 8 coordination events land on the ledger.
Assumptions for the example:
- Path A (pure-Python, JSONL ledger) for concreteness; same shape applies to other paths
- Composition ID: cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL
- Schema registry contains every schema referenced
- Hash chain uses SHA-256
7b.1 Sequence
State machine Event Actor
───────────────── ───────────────────────────────────────── ─────────────────
requested composition.requested.v1 human (Citi user)
↓
lease_open coordination.lease_claimed.v1 agent (claims work)
↓
in_progress coordination.lens_run_started.v1 agent
↓
coordination.psi_round_completed.v1 firm + VRS (federated)
↓
coordination.lens_run_completed.v1 agent
↓
evidence_emitted coordination.evidence_block_emitted.v1 agent
↓
results_available coordination.matches_published.v1 service (firm Beacon)
↓
lease_released coordination.lease_released.v1 agent
↓
closed composition.closed.v1 human (Citi user) — attests
7b.2 Event 1 — composition.requested.v1 (human-initiated, Firm SSO)
{
"event_id": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_001",
"schema_id": "composition.requested.v1",
"composition_id": "cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL",
"ts_utc": "2026-04-30T14:00:00.000Z",
"causal_predecessor": null,
"actor": {
"type": "human",
"id": "user_smith@citi.example",
"version": null
},
"auth": {
"jwt_jti": "jti_8f3a2b1c4d5e6f70",
"subject": "user_smith@citi.example",
"scopes": ["fusion_operator", "vrs_screening_request"],
"issuer": "idp.citi.example",
"issued_at": "2026-04-30T13:55:00Z",
"expires_at": "2026-04-30T15:55:00Z",
"delegation_chain": ["user_smith@citi.example"]
},
"payload": {
"customer_internal_ref": "FIRM-CUST-789012",
"lens_id": "vrs_alerts_v2_equivalent",
"lens_version": "2.0.0",
"screening_purpose": "fca_consumer_duty_vulnerability_check"
},
"abac_decision": {
"outcome": "ALLOW",
"evaluated_at": "2026-04-30T14:00:00.012Z",
"uds_eval_id": "ueval_4xZ8m"
},
"prev_hash": null,
"this_hash": "sha256:9c2a8b...e4f1"
}
7b.3 Event 2 — coordination.lease_claimed.v1 (agent picks up work)
{
"event_id": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_002",
"schema_id": "coordination.lease_claimed.v1",
"composition_id": "cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL",
"ts_utc": "2026-04-30T14:00:00.250Z",
"causal_predecessor": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_001",
"actor": {
"type": "agent",
"id": "agent_vrs_screener_01",
"version": "1.4.2"
},
"agent": {
"model_id": "claude-opus-4-7",
"prompt_hash": "sha256:1f4e6c...a9b3",
"temperature": 0.0,
"seed": 42,
"response_hash": "sha256:7a3d8f...c5e1",
"tool_call_chain": ["claim_screening_task"]
},
"auth": {
"jwt_jti": "jti_a1b2c3d4e5f60718",
"subject": "service_acct_axonis_screener",
"scopes": ["fusion_operator", "lease_claim"],
"issuer": "idp.axonis.internal",
"issued_at": "2026-04-30T14:00:00Z",
"expires_at": "2026-04-30T15:00:00Z",
"delegation_chain": [
"service_acct_axonis_screener",
"user_smith@citi.example"
]
},
"payload": {
"lease_holder": "agent_vrs_screener_01",
"lease_window_seconds": 1800,
"claimed_at": "2026-04-30T14:00:00.250Z",
"lease_expires_at": "2026-04-30T14:30:00.250Z"
},
"abac_decision": {
"outcome": "ALLOW",
"evaluated_at": "2026-04-30T14:00:00.255Z",
"uds_eval_id": "ueval_4xZ8n"
},
"prev_hash": "sha256:9c2a8b...e4f1",
"this_hash": "sha256:b73e1f...8d9c"
}
Notes on this event:
- Agent identity is rich — actor.id + agent.model_id + agent.prompt_hash + agent.response_hash together pin the action for replay
- auth.delegation_chain records that the service account is acting on behalf of the original Citi user — full chain preserved for repudiation defence
- abac_decision.uds_eval_id is the UDS authorization-evaluation receipt — UDS made the decision; the ledger records it
- Lease state derives from this event — no separate lease store
7b.4 Event 4 — coordination.psi_round_completed.v1 (federated, two issuers in play)
{
"event_id": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_004",
"schema_id": "coordination.psi_round_completed.v1",
"composition_id": "cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL",
"ts_utc": "2026-04-30T14:00:38.412Z",
"causal_predecessor": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_003",
"actor": {
"type": "service",
"id": "parallax.fusion_pipeline",
"version": "1.13.0"
},
"auth": {
"jwt_jti": "jti_a1b2c3d4e5f60718",
"subject": "service_acct_axonis_screener",
"scopes": ["fusion_operator", "psi_participant"],
"issuer": "idp.axonis.internal",
"delegation_chain": [
"service_acct_axonis_screener",
"user_smith@citi.example"
]
},
"payload": {
"psi_protocol": "dh-rfc3526-group14",
"rounds": 2,
"node_a_id": "vrs_register_node",
"node_a_issuer": "idp.vrs.example",
"node_a_jwt_jti": "jti_vrs_71e2f8c4d6b09a35",
"node_b_id": "firm_node_citi_uk",
"node_b_issuer": "idp.citi.example",
"node_b_jwt_jti": "jti_8f3a2b1c4d5e6f70",
"set_a_size": 5000,
"set_b_size": 100010,
"intersection_size": 4995,
"raw_records_transmitted": 0
},
"abac_decision": {
"outcome": "ALLOW",
"evaluated_at": "2026-04-30T14:00:38.418Z",
"uds_eval_id": "ueval_4xZ8q"
},
"prev_hash": "sha256:c891fe...2a4b",
"this_hash": "sha256:e2b740...6f3e"
}
Notes on this event:
- Federated SSO is visible in payload.node_a_issuer (VRS IdP) vs payload.node_b_issuer (Citi IdP) — two participants, two issuers
- Each side's JWT JTI is recorded for replay / repudiation defence
- payload.raw_records_transmitted: 0 is a structural assertion — the privacy invariant. Verifiable on replay.
- This event's actor is a service (the fusion pipeline orchestrator), distinct from the agent that claimed the lease
7b.5 Event 6 — coordination.evidence_block_emitted.v1 (frozen, signed evidence)
{
"event_id": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_006",
"schema_id": "coordination.evidence_block_emitted.v1",
"composition_id": "cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL",
"ts_utc": "2026-04-30T14:00:39.115Z",
"causal_predecessor": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_005",
"actor": {
"type": "agent",
"id": "agent_vrs_screener_01",
"version": "1.4.2"
},
"agent": {
"model_id": "claude-opus-4-7",
"prompt_hash": "sha256:bcd421...f817",
"temperature": 0.0,
"seed": 42,
"response_hash": "sha256:e9a0c7...4d12",
"tool_call_chain": ["emit_evidence_block"]
},
"auth": {
"jwt_jti": "jti_a1b2c3d4e5f60718",
"subject": "service_acct_axonis_screener",
"scopes": ["fusion_operator", "evidence_emit"],
"issuer": "idp.axonis.internal",
"delegation_chain": [
"service_acct_axonis_screener",
"user_smith@citi.example"
]
},
"payload": {
"block_id": "blk_01HZQ4XY...evidence",
"lens_id": "vrs_alerts_v2_equivalent",
"lens_version": "2.0.0",
"query_hash": "sha256:7f1e3a...9c5b",
"result_hash": "sha256:4d8b62...e1a0",
"match_count": 4995,
"false_positives": 9,
"false_negatives": 1,
"f1": 0.999,
"frozen_at": "2026-04-30T14:00:39.115Z",
"view_hint": {
"component": "cluster_card",
"layout_type": "evidence_panel",
"primary_field": "match_status"
}
},
"abac_decision": {
"outcome": "ALLOW",
"evaluated_at": "2026-04-30T14:00:39.120Z",
"uds_eval_id": "ueval_4xZ8s"
},
"prev_hash": "sha256:1aff85...cb20",
"this_hash": "sha256:48d3e7...f192"
}
Notes on this event:
- payload.query_hash and payload.result_hash are SPEC-33 evidence-block fields — coordination ledger carries them by reference
- payload.view_hint is the SPEC-33 ViewHint contract embedded in the event payload (Beacon dispatcher reads this to pick the renderer)
- The block is frozen — frozen_at is set; prev_hash + this_hash make tampering detectable
7b.6 Event 8 — composition.closed.v1 (human attestation, terminal state)
{
"event_id": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_008",
"schema_id": "composition.closed.v1",
"composition_id": "cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL",
"ts_utc": "2026-04-30T14:08:22.840Z",
"causal_predecessor": "evt_01HZQ4XY2N7K8R3V5W6P9T2QSL_007",
"actor": {
"type": "human",
"id": "user_smith@citi.example",
"version": null
},
"auth": {
"jwt_jti": "jti_8f3a2b1c4d5e6f70",
"subject": "user_smith@citi.example",
"scopes": ["edition_attest", "vrs_screening_review"],
"issuer": "idp.citi.example",
"delegation_chain": ["user_smith@citi.example"]
},
"payload": {
"decision": "matches_attested_for_action",
"attested_at": "2026-04-30T14:08:22.840Z",
"edition_id": "edition_01HZQ4XY...att",
"evidence_block_ref": "blk_01HZQ4XY...evidence"
},
"abac_decision": {
"outcome": "ALLOW",
"evaluated_at": "2026-04-30T14:08:22.845Z",
"uds_eval_id": "ueval_4xZ8u"
},
"prev_hash": "sha256:7c3e0a...d4b8",
"this_hash": "sha256:a195fe...3c2d"
}
Notes on this event:
- Terminal state closed is reached only by a human-initiated transition — Inv 6 enforced by state-machine schema metadata declaring transitions[*].source: human for transitions into terminal states
- Attestation references the evidence_block by ID — the chain of custody is mechanical: composition → events → evidence_block → frozen result hash
- The user's JWT is the same jti_8f3a2b1c4d5e6f70 as Event 1 (still within the 2-hour validity window) — no token refresh was needed for this short workflow
7b.7 What replay of this composition produces
from axonis_core.ledger import Ledger
ledger = Ledger.open("~/axonis/ledger/composition_events.jsonl")
# Verify chain integrity
ledger.verify() # raises if any prev_hash mismatch
# Replay this composition
events = list(ledger.replay("cmp_01HZQ4XY2N7K8R3V5W6P9T2QSL"))
assert len(events) == 8
assert events[0].schema_id == "composition.requested.v1"
assert events[-1].schema_id == "composition.closed.v1"
# Verify state machine arrived at terminal state by valid path
from axonis_core.statemachine import vrs_screening_workflow
final_state = vrs_screening_workflow.replay_state(events)
assert final_state == "closed"
# Verify privacy invariant
psi_event = next(e for e in events if e.schema_id == "coordination.psi_round_completed.v1")
assert psi_event.payload["raw_records_transmitted"] == 0
# Verify federated SSO chain
human_events = [e for e in events if e.actor.type == "human"]
assert all(e.auth.issuer == "idp.citi.example" for e in human_events)
psi_event_payload = psi_event.payload
assert psi_event_payload["node_a_issuer"] == "idp.vrs.example"
assert psi_event_payload["node_b_issuer"] == "idp.citi.example"
# Verify all ABAC decisions were ALLOW
assert all(e.abac_decision["outcome"] == "ALLOW" for e in events)
Each assertion is a regulator-defensible claim made mechanical: replay determinism, state-machine validity, privacy invariant, federated SSO posture, ABAC outcomes. The same eight events satisfy GDPR Art. 30 (ROPA), Art. 32 (security of processing), Art. 35 (DPIA replay), and FCA Consumer Duty PRIN 2A audit obligations.
7b.8 What this example demonstrates about each primitive
| Primitive | Demonstrated by |
|---|---|
| Schema registry (drift) | Every event has schema_id; validator enforces shape at write |
| Hash-chained ledger (audit) | prev_hash + this_hash chain; ledger.verify() is mechanical |
| State machine (no off-rails) | 6 states × declared transitions; replay_state() arrives at closed only via valid path |
| Agents as actors (Inv 6) | Agent emits propose-events; only human emits composition.closed.v1 |
| Lease lifecycle | lease_claimed → lease_released events with explicit window |
| JWT actions | Every event has auth block with JWT JTI, scopes, delegation chain |
| ABAC integration | Every event records abac_decision from UDS; ledger doesn't decide |
| Federated SSO | Two issuers visible in payload (idp.citi.example + idp.vrs.example) |
| ViewHint (SPEC-33) | Embedded in evidence_block_emitted payload |
8. Cost / time / risk summary
| Path | Implementation cost | Operational cost | Federation risk | Maturity risk |
|---|---|---|---|---|
| A Pure Python in axonis-core | medium (4-6 weeks focused) | low (no new services) | low (federation-native) | medium (in-house code) |
| B Temporal + Confluent | medium-high (similar coding + op setup) | high (2-3 new services) | high (single-cluster pattern) | low (proven ecosystems) |
| C NATS JetStream + in-tree schemas | medium-high | medium (one new service) | low (NATS federation-native) | medium (newer ecosystem at scale) |
| D Existing UDS/Elastic | low (mostly conventions + library) | none added | medium (cross-cluster Elastic ops) | medium (audit-quality contention) |
9. Open questions for the team to resolve before deciding
- What is the regulator-grade audit standard the customer requires? (Determines whether tamper-evidence at the storage layer is mandatory, which excludes Path D.)
- Is the coordination layer expected to run on a Lookup-Light Edge Node, or only on full-Lens Edge Nodes? (Excludes Path B if yes.)
- What's the team's tolerance for adopting a second messaging substrate (NATS) alongside RabbitMQ? (Determines viability of Path C.)
- What's the team's tolerance for in-house libraries vs adopted frameworks? (Affects A vs B trade.)
- How critical is workflow-authoring tooling (UI, BPMN modeller) for this iteration vs later? (B brings these; A defers.)
- What customer scenario will be the first reference implementation — VRS screening, multi-INT cross-cue, Disney 5-way? (Affects which workflow patterns the state machine needs to handle.)
- Is there a target date by which the coordination layer must be live in front of a customer? (Affects implementation-speed weighting.)
10. What this document does not do
- Does not recommend a path
- Does not assume one of the paths is already partially built
- Does not assess which engineering team members would own which path
- Does not score the paths against a fixed weighting of decision criteria
- Does not address the gap between SPEC-34 (this) and what the platform actually has today
These are the conversations to have after the path is chosen.
11. Glossary
| Term | Meaning |
|---|---|
| Composition | A workflow instance — one customer screening, one fusion run, one investigation |
| Composition event | A typed, schema-registered, durable record of one cross-service interaction within a composition |
| Coordination ledger | The append-only store containing composition events, in whichever backing form a chosen path uses |
| Schema registry | The mechanism (file-based, service-based, or vendor-product) that enforces typed event contracts |
| State machine | The declarative definition of valid composition workflows |
| Causal predecessor | The event ID this event logically follows; enables replay and audit reconstruction |
| Federation outbox | The pattern by which cross-organisational replication of events is staged at the producer and verified at the receiver |
Depends on: component.parallax.fusion-adi-integration, component.parallax.wire-message-families
Realizes: product.fusion