Bayesian Posterior Engine — Sensor-Conditioned Threat Surfaces

Status & scope

Status: Draft — ready to implement.
Module: lens/threat/bayesian.py (new) + artifact additions to lens/models/artifacts.py.
Definition of done: pytest tests/test_bayesian.py green.
Sister specs: component.prism.incremental-updates — the streaming case of this engine.
Milestone: Phase 4 (continuous operation).
Date: 2026-06-03.

1. Problem

Prism's threat domain (SPEC-08) composes multiple layers into one surface with a weighted linear sum: composite[cell] = Σ wᵢ · layerᵢ[cell]. This is the right primitive for combining forecast priors — several authoritative depth surfaces (NWM, HEC-RAS, SLOSH, FloodHub-riverine), each reprojected to a common grid and weighted by published skill. It is the wrong primitive for combining a prior surface with in-situ sensor observations.

A sensor reading is not another surface to average in. It is evidence that should condition the prior: a gauge measuring 1.1 m at a point should pull nearby cells toward that observation in proportion to (a) how close they are and (b) how trustworthy the sensor is, and should leave distant cells essentially unchanged. Linear composition cannot express this — it has no notion of observation uncertainty, no notion of spatial influence falloff, and no notion of an output distribution.

Three capabilities are missing today. Verified absent: no bayesian, posterior, kriging, or gaussian_process symbol exists anywhere under lens/.

Conditioning a prior on observations. There is no bayesian_update — only compute_risk_surface (weighted sum).
Posterior exceedance surfaces at named thresholds. compute_risk_surface returns a point estimate per cell, not P(depth ≥ X) at the operational thresholds.
Disagreement / variance output. Where the prior and the sensors conflict, nothing surfaces that conflict explicitly.

The requirement is stated directly in the Houston-federated fixture (themis, external). Plain-language form:

"70% chance water exceeds 1 foot here by 2 PM" — paired with a number showing how much the four models disagree. When they disagree, the system says so out loud.

And in audit terms (HOUSTON-FEDERATED-V0-AUDIT-PLAN.md §B.2): the missing half is the "Bayesian-update half — sensor likelihood (Gaussian, σ from sensor history) + spatial extrapolation → analytical posterior per cell at each threshold," plus a "disagreement / variance surface … needs engine layer + output type." §E.1 scopes it as a "discrete new module … bounded and analytically tractable per the V0 spec ('geostatistics 101; reviewers can verify by hand on small cells')," ~2–3 weeks, "Audit-class: explainable."

This spec defines that module.

2. Decision

Add a grid-cell Bayesian update that conditions a prior surface on a set of point sensor observations and produces a posterior distribution per cell (Gaussian: mean grid + variance grid). From the posterior we derive exceedance surfaces P(depth ≥ X) at the named operational thresholds and a disagreement surface that localizes prior-vs-observation conflict.

Model — conjugate Gaussian update per cell. Each cell c carries a prior N(μ₀[c], σ₀²[c]) derived from the composed prior surface. Each sensor observation oₖ = (lat, lon, value, σ) is a Gaussian likelihood N(value, σ²). A sensor's influence on a cell is not uniform — it is scaled by a distance-decay kernel K(d) so the observation's effective precision at cell c is K(dₖc) / σₖ². The posterior of a Gaussian with multiple independent Gaussian observations is the standard precision-weighted closed form:

prior precision    τ₀ = 1 / σ₀²
obs precision      τₖ(c) = K(dₖc) / σₖ²      (kernel scales the sensor's influence at c)
posterior precision τ_post(c) = τ₀ + Σₖ τₖ(c)
posterior mean      μ_post(c) = ( τ₀·μ₀ + Σₖ τₖ(c)·valueₖ ) / τ_post(c)
posterior variance  σ²_post(c) = 1 / τ_post(c)

This is exact, closed-form, deterministic, and verifiable by hand on a single cell — which is the property the audit plan requires for replay determinism (it names the Bayesian module as "the highest-risk surface" for floating-point variance, §C.1/D).

Kernel — analytically tractable, not full GP regression. Spatial influence falloff is a distance-decay kernel evaluated independently per (sensor, cell) pair. The default and only required kernel for V0 is a Gaussian (squared-exponential) kernel K(d) = exp(−d² / (2·ℓ²)) with length-scale ℓ (an inverse-distance kernel K(d) = 1/(1 + (d/ℓ)ⁿ) is offered as an alternative). We deliberately do not implement full Gaussian-process regression (a global covariance matrix inversion over all cells) for V0. Full GP / kriging is recorded as an Open Question (§10) with the per-pair kernel as the default. Rationale: Invariant 6 favours deterministic, explainable algorithms; the per-pair kernel decomposes per cell into "prior term + one term per sensor" that a reviewer can read off, whereas GP regression's matrix inverse is FP-fragile across platforms and opaque to a per-cell explanation. See §7 for how this resolves the apparent conflict with the fixture's gaussian_process_kernel wording.

Where it sits. bayesian_update is a new composition method that runs after the weighted-prior composition. The weighted sum (SPEC-08) builds the prior surface from the forecast priors; the Bayesian update conditions that prior on the sensor likelihood. They are two stages, not competitors.

3. Public API

All dataclasses frozen; all functions pure (no I/O, no network, no global state). Module: lens/threat/bayesian.py. The posterior artifact lands in lens/models/artifacts.py alongside RiskSurface, Route, etc.

# lens/models/artifacts.py  (additions)

from __future__ import annotations
from dataclasses import dataclass, field
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    import numpy as np
# numpy resolves locally inside as_numpy() — `import lens` must stay green with
# base deps only (component.prism.packaging DoD); numpy is extras-only.


@dataclass(frozen=True)
class SensorObservation:
    """A single point observation feeding the Bayesian likelihood."""

    lat: float
    lon: float
    value: float                 # observed quantity (e.g. depth in metres)
    sigma: float                 # observation std-dev; smaller = more trusted
    timestamp: str = ""          # ISO-8601 UTC; advisory for temporal decay (§10)
    sensor_id: str = ""
    trust_class: str = "high"    # advisory label; sigma carries the actual weight


@dataclass(frozen=True)
class PosteriorSurface:
    """Per-cell Gaussian posterior after conditioning a prior on observations.

    mean_grid[r, c] and variance_grid[r, c] define N(mean, variance) for cell (r, c).
    Grids are row-major, same shape and bbox convention as RiskSurface.
    """

    mean_grid: tuple[tuple[float, ...], ...] = ()      # immutable, JSON-stable
    variance_grid: tuple[tuple[float, ...], ...] = ()
    grid_rows: int = 0
    grid_cols: int = 0
    bbox: tuple[float, float, float, float] = (0.0, 0.0, 0.0, 0.0)  # min_lat,min_lon,max_lat,max_lon
    observations_used: tuple[SensorObservation, ...] = ()
    kernel: str = "gaussian"
    length_scale_m: float = 0.0
    metadata: dict = field(default_factory=dict)

    def exceedance(self, threshold: float) -> tuple[tuple[float, ...], ...]:
        """P(value >= threshold) per cell. Convenience wrapper over exceedance_surface()."""
        ...

    def as_numpy(self) -> tuple[np.ndarray, np.ndarray]:
        """(mean, variance) as 2-D arrays for downstream numeric ops."""
        ...

# lens/threat/bayesian.py

from __future__ import annotations
from dataclasses import dataclass
import numpy as np
from lens.models.artifacts import PosteriorSurface, SensorObservation
from lens.threat.surface import RiskSurface


@dataclass(frozen=True)
class KernelParams:
    """Distance-decay kernel configuration."""

    kind: str = "gaussian"        # "gaussian" (default) | "inverse_distance"
    length_scale_m: float = 2000.0
    inverse_power: float = 2.0    # only for inverse_distance
    prior_sigma_floor: float = 0.5  # min prior std-dev so a cell is never infinitely certain
    max_influence_m: float = 0.0  # 0 = unbounded; else hard cut-off for sparse compute


def bayesian_update(
    prior: RiskSurface | np.ndarray,
    observations: tuple[SensorObservation, ...] | list[SensorObservation],
    bbox: tuple[float, float, float, float],
    kernel: KernelParams = KernelParams(),
    prior_variance: np.ndarray | None = None,
) -> PosteriorSurface:
    """Condition a prior surface on point observations → Gaussian posterior per cell.

    Closed-form precision-weighted Gaussian conjugate update (§2). Pure and
    deterministic: identical inputs → bitwise-identical grids.

    Args:
        prior: RiskSurface (mean read from cell.risk_score) or a 2-D mean grid.
        observations: point sensors, each carrying value + sigma.
        bbox: (min_lat, min_lon, max_lat, max_lon) — defines cell centre coords.
        kernel: distance-decay kernel + prior variance floor.
        prior_variance: optional per-cell prior variance grid; else derived
            from kernel.prior_sigma_floor (uniform prior uncertainty).

    Returns:
        PosteriorSurface with mean_grid + variance_grid.
    """
    ...


def exceedance_surface(
    posterior: PosteriorSurface,
    threshold: float,
) -> np.ndarray:
    """P(value >= threshold) per cell, from the Gaussian posterior.

    Uses the closed-form normal survival function:
        P(X >= t) = 0.5 * erfc( (t - mu) / (sigma * sqrt(2)) )
    erfc via math.erfc (deterministic, libm) — no scipy, no sampling.
    """
    ...


def disagreement_surface(
    posterior: PosteriorSurface,
    prior: RiskSurface | np.ndarray,
) -> np.ndarray:
    """Per-cell conflict between prior and observation-driven posterior.

    Returns a [0, ∞) grid: standardized shift of the posterior mean away from
    the prior mean, normalized by posterior uncertainty:
        disagreement[c] = |mu_post[c] - mu_prior[c]| / sqrt(var_post[c])
    High where sensors pulled a cell hard away from the prior (genuine conflict);
    near-zero where prior and sensors agree or no sensor reached the cell.
    This is the "where the experts disagree" surface the fixture requires.
    """
    ...

exceedance at the three named thresholds and disagreement_surface are the two output rasters the Houston lens declares (probability_of_exceedance_raster + disagreement_raster).

4. Evidence integration

A PosteriorSurface is an evidence-block artifact like any other lens output. The orchestrator (run_lens) calls create_evidence_block(spec, inputs, output=<posterior payload>), which hashes the canonicalized output (lens/evidence.py:_deterministic_hash — JSON, sort_keys=True).

Determinism requirements so identical inputs yield identical block hashes (Invariants 3 & 4 — query hash is source of truth; frozen means frozen):

Observation ordering is canonicalized before update — sort by (sensor_id, lat, lon, timestamp). Gaussian conjugate update is order-independent in exact arithmetic, but summation order affects floating-point rounding; pinning the order makes the result bitwise-reproducible across runs and machines (the FP-determinism mitigation the audit plan demands, §C.1).
Grids serialize as nested tuples of float, not numpy arrays — JSON-stable, no dtype/endianness leakage into the hash.
No timestamps in the hashed payload beyond those carried in the canonical observation list. Wall-clock compute_timestamp lives in Provenance, which is outside the result_hash.
erfc/exp via libm only — CPU floating-point, no GPU, no vectorized BLAS path that could reorder reductions.

The evidence payload contains the mean grid, variance grid, kernel params, the canonical observation list (id + value + sigma — not raw provider blobs), and the threshold set, so the posterior is fully reconstructible from the block.

5. Lens YAML surface

A threat lens declares a bayesian_update composition method alongside the existing weighted-prior layers. The parser (SPEC-02) gains a composition.method field with two values: weighted_composite (today's default) and bayesian_update (this spec).

lens:
  id: houston_flood_composition_v1
  type: threat
  version: 1.0.0

# Stage 1: forecast priors → weighted prior surface (SPEC-08, unchanged)
priors:
  - id: nwm        ; source: noaa.nwm.retrospective    ; weight: 0.30
  - id: hec_ras    ; source: hcfcd.maapnext.hec_ras_2d ; weight: 0.50
  - id: slosh      ; source: noaa.slosh.archives       ; weight: 0.20

# Stage 2: condition the prior on sensor likelihood (this spec)
composition:
  method: bayesian_update          # vs. weighted_composite (default)
  prior_from: priors               # the weighted prior surface is the prior mean
  likelihood:
    sources:
      - id: hcfcd_gauges ; source: hcfcd.gauge_api ; trust_class: high
      - id: usgs_nwis    ; source: usgs.nwis.iv    ; trust_class: high
    sigma_from: sensor_history     # σ per sensor from its historical variance
  kernel:
    kind: gaussian                 # gaussian (default) | inverse_distance
    length_scale_m: 2000
    prior_sigma_floor: 0.5

output:
  exceedance_thresholds_m: [0.5, 1.0, 2.0]   # named: vehicle_disable / life_safety / structural
  disagreement_surface: true

The orchestrator routes on composition.method: weighted_composite → compute_risk_surface (existing path); bayesian_update → build the prior surface from priors, then call bayesian_update(prior, observations, ...), then exceedance_surface per threshold and disagreement_surface. Lenses without a composition.method field default to weighted_composite (backward compatible — existing packs unaffected).

6. Invariants

Deterministic. Closed-form conjugate update + libm erfc/exp. Same inputs → bitwise-identical grids → identical evidence-block hash. No sampling, no RNG, no iterative solver, no matrix inversion. (Invariant 4.)
Pure, no infrastructure. numpy + stdlib math only. No network, no ES, no Dask, no file I/O inside the engine. Observations arrive as SensorObservation tuples; the engine never fetches them. (CLAUDE.md standalone constraint.)
Explainable per cell — required for the audit-class label. For any cell the posterior decomposes into named contributions: the prior term (τ₀, μ₀) and one term per sensor (sensor_id, distance, K(d), σ, effective_precision, value). A debug accessor explain_cell(posterior, prior, observations, r, c) -> dict returns this decomposition so a reviewer can reconstruct μ_post by hand — the "verify by hand on small cells" property (Invariant 6: AI assists, humans attest; the human must be able to read the math).
No decisions. The engine outputs probabilities and uncertainty. Threshold-crossing → action mapping is the lens/Sentinel layer, not this module. (Invariant 6.)
Append-only evidence. Posterior blocks are frozen on creation; recomposition (SPEC-19) appends a new block, never mutates. (Invariant 2.)

7. Houston requirement ↔ Prism architecture — the one conflict, resolved

The fixture's lens sketch (HOUSTON-SOURCES-AND-SPEC.md) writes spatial_extrapolation: gaussian_process_kernel and method: bayesian_update_gaussian_per_sensor. Read literally, "Gaussian process" suggests full GP regression — a global covariance matrix over all cells, inverted once, producing jointly-consistent posterior means and a full covariance. That conflicts with the deterministic/explainable invariant in two ways: (1) the O(N³) matrix inverse is floating-point-fragile across platforms — exactly the replay-determinism risk the audit plan flags for this module; (2) the result is not decomposable into a per-cell "prior + one term per sensor" explanation a human can verify by hand.

Resolution: This spec reads the fixture's intent — Gaussian likelihood + spatially-decaying sensor influence — and implements it with the per-(sensor, cell) Gaussian distance-decay kernel (§2), which is the squared-exponential kernel function that GP regression itself uses, applied independently per pair rather than assembled into a global covariance matrix. This satisfies the fixture's measurable requirements (Gaussian likelihood, spatial extrapolation, posterior exceedance at named thresholds, disagreement surface) and the audit plan's own framing ("geostatistics 101; reviewers can verify by hand"), while staying deterministic and explainable. Full GP regression remains an Open Question (§10) with the per-pair kernel as the standing default — to be adopted only if a fixture demonstrably requires joint posterior covariance and the FP-determinism pre-flight (audit §C.1) passes for the matrix-inverse path. No other Houston requirement conflicts with Prism architecture.

8. Testing — `tests/test_bayesian.py`

Test	Asserts
`test_single_cell_conjugate_closed_form`	One cell, one sensor directly on it (`K=1`). Hand-computed `μ_post`, `σ²_post` from `τ₀=1/σ₀²`, `τ₁=1/σ₁²`. Assert `bayesian_update` matches the closed form to ≤1e-9. The "verify by hand" anchor.
`test_no_observations_returns_prior`	Empty observation list → posterior mean == prior mean, posterior variance == prior variance floor.
`test_distant_sensor_no_effect`	Sensor far beyond the length-scale → posterior ≈ prior at distant cells (kernel → 0); near-zero shift.
`test_kernel_monotone_falloff`	Influence on a cell decreases monotonically with distance for both `gaussian` and `inverse_distance` kernels.
`test_exceedance_at_named_thresholds`	For a cell with known `(μ, σ)`, `exceedance_surface` at `{0.5, 1.0, 2.0}` matches `0.5*erfc((t-μ)/(σ√2))` to ≤1e-9; ordering `P(≥0.5) ≥ P(≥1.0) ≥ P(≥2.0)`.
`test_exceedance_bounds`	All exceedance probabilities ∈ [0, 1]; `μ ≫ t` → ≈1; `μ ≪ t` → ≈0.
`test_disagreement_sanity`	Sensor value far from prior near a cell → high disagreement there; prior==observation → ≈0; no-sensor cell → ≈0.
`test_determinism_hash_stability`	Run `bayesian_update` twice (observations shuffled between runs) → identical grids; `create_evidence_block` → identical `result_hash`. Canonical ordering proof.
`test_explain_cell_reconstructs_posterior`	`explain_cell` terms (prior + per-sensor) recombine to the reported `μ_post`/`σ²_post`. Explainability invariant.
`test_houston_mini_fixture_integration`	8×8 grid, weighted prior from 2 mock priors, 3 mock gauges (one in conflict). End-to-end: prior → `bayesian_update` → `exceedance_surface` ×3 → `disagreement_surface` → evidence block. Asserts shapes, exceedance ordering, and the conflict cell shows elevated disagreement. The fixture-shaped acceptance test (data inline; no network, no themis dependency).

All tests CPU-only, pure-Python, deterministic; no fixtures fetched.

9. Acceptance criteria

[ ] SensorObservation and PosteriorSurface frozen dataclasses added to lens/models/artifacts.py
[ ] bayesian_update(prior, observations, bbox, kernel) -> PosteriorSurface implemented, closed-form, pure
[ ] exceedance_surface(posterior, threshold) -> np.ndarray via libm erfc; thresholds {0.5, 1.0, 2.0} exercised
[ ] disagreement_surface(posterior, prior) -> np.ndarray implemented
[ ] Gaussian (default) and inverse-distance kernels implemented via KernelParams
[ ] explain_cell per-cell decomposition accessor implemented
[ ] Posterior surfaces hash deterministically through create_evidence_block (canonical observation ordering)
[ ] Parser accepts composition.method: bayesian_update; orchestrator routes to this engine; absent field defaults to weighted_composite (backward compatible)
[ ] pytest tests/test_bayesian.py green — all tests in §8

10. What themis measures

The Houston-federated fixture (themis, external) instruments this engine against FED-SPEC-04 §2.B (Cost-Lens Performance) and the audit plan's benchmarks:

04.B.1 Surface continuity — posterior mean/variance grids are C0-continuous; no spurious discontinuities at sensor locations or kernel edges.
04.B.2 Weight sensitivity — ∂(posterior)/∂(length_scale, prior_sigma_floor, σ) bounded; small input perturbation → small output change.
04.B.5 / 04.B.6 Latency / throughput — raster-heavy compute; single-update p50/p99 and decisions/sec on TRM-1.
Replay determinism (FED-SPEC-02, audit §C.1) — the named pre-flight: run bayesian_update on two Macs with identical inputs; assert bitwise-identical grids before the fixture claims replay determinism. This module is flagged as the highest-FP-risk surface, so this gate is mandatory.
Provenance completeness (audit §C.2) — the evidence block reconstructs the posterior from its inputs (priors + canonical observations + kernel), no dangling references across the three federates.
Posterior + disagreement output type (audit §B.2) — the fixture's headline output: P(depth ≥ X) at {0.5m, 1m, 2m} plus the per-cell disagreement raster, both produced by this engine.

11. Open questions (with defaults)

Full GP / kriging vs. per-pair kernel. Default: per-pair Gaussian distance-decay kernel (§2/§7). Adopt full GP regression (global covariance, joint posterior) only if a fixture demonstrably needs joint cross-cell posterior covariance and the FP-determinism pre-flight passes for the matrix-inverse path. The per-pair kernel is the standing, audit-class-explainable default.
Non-Gaussian likelihoods. Some sensors (binary "flooded / not flooded" citizen reports, censored gauge readings at rail limits) are not Gaussian. Default: Gaussian only for V0; model low-trust/binary sources by inflating σ (down-weighting) rather than changing the likelihood family. Beta/Bernoulli likelihoods are a future extension behind a likelihood_family field — they break the conjugate closed form and need a determinism story first.
Temporal decay of observations. A gauge reading from 3 hours ago is weaker evidence than one from 5 minutes ago. Default: no temporal decay in V0 — timestamp is carried on SensorObservation (advisory) but does not scale precision. When added, decay scales σ (or the kernel) by an explicit half-life parameter so it stays in the closed form and remains deterministic; this is the natural tie-in to SPEC-19 incremental recomposition (a stale observation's footprint shrinks over time).
Prior variance source. Default: uniform prior_sigma_floor across cells. A future refinement derives per-cell prior variance from the inter-prior spread (the variance of NWM/HEC-RAS/SLOSH at each cell), making the prior less certain exactly where the forecasts already disagree — which would also feed the disagreement surface. Deferred to keep V0's prior simple and hand-verifiable.

12. Cross-references

Document	Relationship
SPEC-08 (axonis-lens)	Threat Engine — `compute_risk_surface` builds the weighted prior surface this engine conditions; shares the grid/cell + bbox model.
SPEC-04 (axonis-lens)	Evidence Model — posterior surfaces are frozen evidence blocks via `create_evidence_block`; determinism rules in §4.
SPEC-02 (axonis-lens)	Lens Parser — extended with `composition.method: bayesian_update` (§5).
SPEC-19 (axonis-lens)	Incremental Updates — the streaming case of this engine: a gauge update re-runs `bayesian_update` over the affected footprint; threshold crossings on the posterior become signals. Temporal-decay open question ties in here.
`HOUSTON-FEDERATED-V0-AUDIT-PLAN.md` (themis)	§B.2 names this gap; §E.1 scopes the module (~2–3 weeks, "analytically tractable", audit-class explainable); §C.1 the FP-determinism pre-flight.
`HOUSTON-SOURCES-AND-SPEC.md` (themis)	Sources table (priors vs. observations vs. truth) + the `houston_flood_composition_v1` lens sketch this spec implements.
`HOUSTON-PLAIN-LANGUAGE.md` (themis)	The decision-maker requirement: probability of exceedance + disagreement, in plain terms.
FED-SPEC-04 §2.B (themis)	Cost-Lens performance benchmarks this engine is measured against (§10).
FED-SPEC-02 (themis)	Replay determinism — this engine is the highest-FP-risk surface; §4 + §10 define the mitigation and pre-flight gate.

Owner: CPO. Houston-federated fixture (themis, external) is the driving initiative. Next review when the single-cell conjugate test and the Houston mini-fixture integration test land green.

Depends on: component.prism.evidence-model, component.prism.packaging, component.prism.threat-engine, component.prism.universal-lens-parser

Realizes: product.lens