Bayesian Posterior Engine — Sensor-Conditioned Threat Surfaces
Status & scope
- Status: Draft — ready to implement.
- Module:
lens/threat/bayesian.py(new) + artifact additions tolens/models/artifacts.py. - Definition of done:
pytest tests/test_bayesian.pygreen. - Sister specs: component.prism.incremental-updates — the streaming case of this engine.
- Milestone: Phase 4 (continuous operation).
- Date: 2026-06-03.
1. Problem
Prism's threat domain (SPEC-08) composes multiple layers into one surface with a weighted linear sum: composite[cell] = Σ wᵢ · layerᵢ[cell]. This is the right primitive for combining forecast priors — several authoritative depth surfaces (NWM, HEC-RAS, SLOSH, FloodHub-riverine), each reprojected to a common grid and weighted by published skill. It is the wrong primitive for combining a prior surface with in-situ sensor observations.
A sensor reading is not another surface to average in. It is evidence that should condition the prior: a gauge measuring 1.1 m at a point should pull nearby cells toward that observation in proportion to (a) how close they are and (b) how trustworthy the sensor is, and should leave distant cells essentially unchanged. Linear composition cannot express this — it has no notion of observation uncertainty, no notion of spatial influence falloff, and no notion of an output distribution.
Three capabilities are missing today. Verified absent: no bayesian, posterior, kriging, or gaussian_process symbol exists anywhere under lens/.
- Conditioning a prior on observations. There is no
bayesian_update— onlycompute_risk_surface(weighted sum). - Posterior exceedance surfaces at named thresholds.
compute_risk_surfacereturns a point estimate per cell, notP(depth ≥ X)at the operational thresholds. - Disagreement / variance output. Where the prior and the sensors conflict, nothing surfaces that conflict explicitly.
The requirement is stated directly in the Houston-federated fixture (themis, external). Plain-language form:
"70% chance water exceeds 1 foot here by 2 PM" — paired with a number showing how much the four models disagree. When they disagree, the system says so out loud.
And in audit terms (HOUSTON-FEDERATED-V0-AUDIT-PLAN.md §B.2): the missing half is the "Bayesian-update half — sensor likelihood (Gaussian, σ from sensor history) + spatial extrapolation → analytical posterior per cell at each threshold," plus a "disagreement / variance surface … needs engine layer + output type." §E.1 scopes it as a "discrete new module … bounded and analytically tractable per the V0 spec ('geostatistics 101; reviewers can verify by hand on small cells')," ~2–3 weeks, "Audit-class: explainable."
This spec defines that module.
2. Decision
Add a grid-cell Bayesian update that conditions a prior surface on a set of point sensor observations and produces a posterior distribution per cell (Gaussian: mean grid + variance grid). From the posterior we derive exceedance surfaces P(depth ≥ X) at the named operational thresholds and a disagreement surface that localizes prior-vs-observation conflict.
Model — conjugate Gaussian update per cell. Each cell c carries a prior N(μ₀[c], σ₀²[c]) derived from the composed prior surface. Each sensor observation oₖ = (lat, lon, value, σ) is a Gaussian likelihood N(value, σ²). A sensor's influence on a cell is not uniform — it is scaled by a distance-decay kernel K(d) so the observation's effective precision at cell c is K(dₖc) / σₖ². The posterior of a Gaussian with multiple independent Gaussian observations is the standard precision-weighted closed form:
prior precision τ₀ = 1 / σ₀²
obs precision τₖ(c) = K(dₖc) / σₖ² (kernel scales the sensor's influence at c)
posterior precision τ_post(c) = τ₀ + Σₖ τₖ(c)
posterior mean μ_post(c) = ( τ₀·μ₀ + Σₖ τₖ(c)·valueₖ ) / τ_post(c)
posterior variance σ²_post(c) = 1 / τ_post(c)
This is exact, closed-form, deterministic, and verifiable by hand on a single cell — which is the property the audit plan requires for replay determinism (it names the Bayesian module as "the highest-risk surface" for floating-point variance, §C.1/D).
Kernel — analytically tractable, not full GP regression. Spatial influence falloff is a distance-decay kernel evaluated independently per (sensor, cell) pair. The default and only required kernel for V0 is a Gaussian (squared-exponential) kernel K(d) = exp(−d² / (2·ℓ²)) with length-scale ℓ (an inverse-distance kernel K(d) = 1/(1 + (d/ℓ)ⁿ) is offered as an alternative). We deliberately do not implement full Gaussian-process regression (a global covariance matrix inversion over all cells) for V0. Full GP / kriging is recorded as an Open Question (§10) with the per-pair kernel as the default. Rationale: Invariant 6 favours deterministic, explainable algorithms; the per-pair kernel decomposes per cell into "prior term + one term per sensor" that a reviewer can read off, whereas GP regression's matrix inverse is FP-fragile across platforms and opaque to a per-cell explanation. See §7 for how this resolves the apparent conflict with the fixture's gaussian_process_kernel wording.
Where it sits. bayesian_update is a new composition method that runs after the weighted-prior composition. The weighted sum (SPEC-08) builds the prior surface from the forecast priors; the Bayesian update conditions that prior on the sensor likelihood. They are two stages, not competitors.
3. Public API
All dataclasses frozen; all functions pure (no I/O, no network, no global state). Module: lens/threat/bayesian.py. The posterior artifact lands in lens/models/artifacts.py alongside RiskSurface, Route, etc.
# lens/models/artifacts.py (additions)
from __future__ import annotations
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import numpy as np
# numpy resolves locally inside as_numpy() — `import lens` must stay green with
# base deps only (component.prism.packaging DoD); numpy is extras-only.
@dataclass(frozen=True)
class SensorObservation:
"""A single point observation feeding the Bayesian likelihood."""
lat: float
lon: float
value: float # observed quantity (e.g. depth in metres)
sigma: float # observation std-dev; smaller = more trusted
timestamp: str = "" # ISO-8601 UTC; advisory for temporal decay (§10)
sensor_id: str = ""
trust_class: str = "high" # advisory label; sigma carries the actual weight
@dataclass(frozen=True)
class PosteriorSurface:
"""Per-cell Gaussian posterior after conditioning a prior on observations.
mean_grid[r, c] and variance_grid[r, c] define N(mean, variance) for cell (r, c).
Grids are row-major, same shape and bbox convention as RiskSurface.
"""
mean_grid: tuple[tuple[float, ...], ...] = () # immutable, JSON-stable
variance_grid: tuple[tuple[float, ...], ...] = ()
grid_rows: int = 0
grid_cols: int = 0
bbox: tuple[float, float, float, float] = (0.0, 0.0, 0.0, 0.0) # min_lat,min_lon,max_lat,max_lon
observations_used: tuple[SensorObservation, ...] = ()
kernel: str = "gaussian"
length_scale_m: float = 0.0
metadata: dict = field(default_factory=dict)
def exceedance(self, threshold: float) -> tuple[tuple[float, ...], ...]:
"""P(value >= threshold) per cell. Convenience wrapper over exceedance_surface()."""
...
def as_numpy(self) -> tuple[np.ndarray, np.ndarray]:
"""(mean, variance) as 2-D arrays for downstream numeric ops."""
...
# lens/threat/bayesian.py
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
from lens.models.artifacts import PosteriorSurface, SensorObservation
from lens.threat.surface import RiskSurface
@dataclass(frozen=True)
class KernelParams:
"""Distance-decay kernel configuration."""
kind: str = "gaussian" # "gaussian" (default) | "inverse_distance"
length_scale_m: float = 2000.0
inverse_power: float = 2.0 # only for inverse_distance
prior_sigma_floor: float = 0.5 # min prior std-dev so a cell is never infinitely certain
max_influence_m: float = 0.0 # 0 = unbounded; else hard cut-off for sparse compute
def bayesian_update(
prior: RiskSurface | np.ndarray,
observations: tuple[SensorObservation, ...] | list[SensorObservation],
bbox: tuple[float, float, float, float],
kernel: KernelParams = KernelParams(),
prior_variance: np.ndarray | None = None,
) -> PosteriorSurface:
"""Condition a prior surface on point observations → Gaussian posterior per cell.
Closed-form precision-weighted Gaussian conjugate update (§2). Pure and
deterministic: identical inputs → bitwise-identical grids.
Args:
prior: RiskSurface (mean read from cell.risk_score) or a 2-D mean grid.
observations: point sensors, each carrying value + sigma.
bbox: (min_lat, min_lon, max_lat, max_lon) — defines cell centre coords.
kernel: distance-decay kernel + prior variance floor.
prior_variance: optional per-cell prior variance grid; else derived
from kernel.prior_sigma_floor (uniform prior uncertainty).
Returns:
PosteriorSurface with mean_grid + variance_grid.
"""
...
def exceedance_surface(
posterior: PosteriorSurface,
threshold: float,
) -> np.ndarray:
"""P(value >= threshold) per cell, from the Gaussian posterior.
Uses the closed-form normal survival function:
P(X >= t) = 0.5 * erfc( (t - mu) / (sigma * sqrt(2)) )
erfc via math.erfc (deterministic, libm) — no scipy, no sampling.
"""
...
def disagreement_surface(
posterior: PosteriorSurface,
prior: RiskSurface | np.ndarray,
) -> np.ndarray:
"""Per-cell conflict between prior and observation-driven posterior.
Returns a [0, ∞) grid: standardized shift of the posterior mean away from
the prior mean, normalized by posterior uncertainty:
disagreement[c] = |mu_post[c] - mu_prior[c]| / sqrt(var_post[c])
High where sensors pulled a cell hard away from the prior (genuine conflict);
near-zero where prior and sensors agree or no sensor reached the cell.
This is the "where the experts disagree" surface the fixture requires.
"""
...
exceedance at the three named thresholds and disagreement_surface are the two output rasters the Houston lens declares (probability_of_exceedance_raster + disagreement_raster).
4. Evidence integration
A PosteriorSurface is an evidence-block artifact like any other lens output. The orchestrator (run_lens) calls create_evidence_block(spec, inputs, output=<posterior payload>), which hashes the canonicalized output (lens/evidence.py:_deterministic_hash — JSON, sort_keys=True).
Determinism requirements so identical inputs yield identical block hashes (Invariants 3 & 4 — query hash is source of truth; frozen means frozen):
- Observation ordering is canonicalized before update — sort by
(sensor_id, lat, lon, timestamp). Gaussian conjugate update is order-independent in exact arithmetic, but summation order affects floating-point rounding; pinning the order makes the result bitwise-reproducible across runs and machines (the FP-determinism mitigation the audit plan demands, §C.1). - Grids serialize as nested tuples of
float, not numpy arrays — JSON-stable, no dtype/endianness leakage into the hash. - No timestamps in the hashed payload beyond those carried in the canonical observation list. Wall-clock
compute_timestamplives inProvenance, which is outside theresult_hash. - erfc/exp via libm only — CPU floating-point, no GPU, no vectorized BLAS path that could reorder reductions.
The evidence payload contains the mean grid, variance grid, kernel params, the canonical observation list (id + value + sigma — not raw provider blobs), and the threshold set, so the posterior is fully reconstructible from the block.
5. Lens YAML surface
A threat lens declares a bayesian_update composition method alongside the existing weighted-prior layers. The parser (SPEC-02) gains a composition.method field with two values: weighted_composite (today's default) and bayesian_update (this spec).
lens:
id: houston_flood_composition_v1
type: threat
version: 1.0.0
# Stage 1: forecast priors → weighted prior surface (SPEC-08, unchanged)
priors:
- id: nwm ; source: noaa.nwm.retrospective ; weight: 0.30
- id: hec_ras ; source: hcfcd.maapnext.hec_ras_2d ; weight: 0.50
- id: slosh ; source: noaa.slosh.archives ; weight: 0.20
# Stage 2: condition the prior on sensor likelihood (this spec)
composition:
method: bayesian_update # vs. weighted_composite (default)
prior_from: priors # the weighted prior surface is the prior mean
likelihood:
sources:
- id: hcfcd_gauges ; source: hcfcd.gauge_api ; trust_class: high
- id: usgs_nwis ; source: usgs.nwis.iv ; trust_class: high
sigma_from: sensor_history # σ per sensor from its historical variance
kernel:
kind: gaussian # gaussian (default) | inverse_distance
length_scale_m: 2000
prior_sigma_floor: 0.5
output:
exceedance_thresholds_m: [0.5, 1.0, 2.0] # named: vehicle_disable / life_safety / structural
disagreement_surface: true
The orchestrator routes on composition.method: weighted_composite → compute_risk_surface (existing path); bayesian_update → build the prior surface from priors, then call bayesian_update(prior, observations, ...), then exceedance_surface per threshold and disagreement_surface. Lenses without a composition.method field default to weighted_composite (backward compatible — existing packs unaffected).
6. Invariants
- Deterministic. Closed-form conjugate update + libm erfc/exp. Same inputs → bitwise-identical grids → identical evidence-block hash. No sampling, no RNG, no iterative solver, no matrix inversion. (Invariant 4.)
- Pure, no infrastructure. numpy + stdlib
mathonly. No network, no ES, no Dask, no file I/O inside the engine. Observations arrive asSensorObservationtuples; the engine never fetches them. (CLAUDE.md standalone constraint.) - Explainable per cell — required for the audit-class label. For any cell the posterior decomposes into named contributions: the prior term
(τ₀, μ₀)and one term per sensor(sensor_id, distance, K(d), σ, effective_precision, value). A debug accessorexplain_cell(posterior, prior, observations, r, c) -> dictreturns this decomposition so a reviewer can reconstructμ_postby hand — the "verify by hand on small cells" property (Invariant 6: AI assists, humans attest; the human must be able to read the math). - No decisions. The engine outputs probabilities and uncertainty. Threshold-crossing → action mapping is the lens/Sentinel layer, not this module. (Invariant 6.)
- Append-only evidence. Posterior blocks are frozen on creation; recomposition (SPEC-19) appends a new block, never mutates. (Invariant 2.)
7. Houston requirement ↔ Prism architecture — the one conflict, resolved
The fixture's lens sketch (HOUSTON-SOURCES-AND-SPEC.md) writes spatial_extrapolation: gaussian_process_kernel and method: bayesian_update_gaussian_per_sensor. Read literally, "Gaussian process" suggests full GP regression — a global covariance matrix over all cells, inverted once, producing jointly-consistent posterior means and a full covariance. That conflicts with the deterministic/explainable invariant in two ways: (1) the O(N³) matrix inverse is floating-point-fragile across platforms — exactly the replay-determinism risk the audit plan flags for this module; (2) the result is not decomposable into a per-cell "prior + one term per sensor" explanation a human can verify by hand.
Resolution: This spec reads the fixture's intent — Gaussian likelihood + spatially-decaying sensor influence — and implements it with the per-(sensor, cell) Gaussian distance-decay kernel (§2), which is the squared-exponential kernel function that GP regression itself uses, applied independently per pair rather than assembled into a global covariance matrix. This satisfies the fixture's measurable requirements (Gaussian likelihood, spatial extrapolation, posterior exceedance at named thresholds, disagreement surface) and the audit plan's own framing ("geostatistics 101; reviewers can verify by hand"), while staying deterministic and explainable. Full GP regression remains an Open Question (§10) with the per-pair kernel as the standing default — to be adopted only if a fixture demonstrably requires joint posterior covariance and the FP-determinism pre-flight (audit §C.1) passes for the matrix-inverse path. No other Houston requirement conflicts with Prism architecture.
8. Testing — tests/test_bayesian.py
| Test | Asserts |
|---|---|
test_single_cell_conjugate_closed_form |
One cell, one sensor directly on it (K=1). Hand-computed μ_post, σ²_post from τ₀=1/σ₀², τ₁=1/σ₁². Assert bayesian_update matches the closed form to ≤1e-9. The "verify by hand" anchor. |
test_no_observations_returns_prior |
Empty observation list → posterior mean == prior mean, posterior variance == prior variance floor. |
test_distant_sensor_no_effect |
Sensor far beyond the length-scale → posterior ≈ prior at distant cells (kernel → 0); near-zero shift. |
test_kernel_monotone_falloff |
Influence on a cell decreases monotonically with distance for both gaussian and inverse_distance kernels. |
test_exceedance_at_named_thresholds |
For a cell with known (μ, σ), exceedance_surface at {0.5, 1.0, 2.0} matches 0.5*erfc((t-μ)/(σ√2)) to ≤1e-9; ordering P(≥0.5) ≥ P(≥1.0) ≥ P(≥2.0). |
test_exceedance_bounds |
All exceedance probabilities ∈ [0, 1]; μ ≫ t → ≈1; μ ≪ t → ≈0. |
test_disagreement_sanity |
Sensor value far from prior near a cell → high disagreement there; prior==observation → ≈0; no-sensor cell → ≈0. |
test_determinism_hash_stability |
Run bayesian_update twice (observations shuffled between runs) → identical grids; create_evidence_block → identical result_hash. Canonical ordering proof. |
test_explain_cell_reconstructs_posterior |
explain_cell terms (prior + per-sensor) recombine to the reported μ_post/σ²_post. Explainability invariant. |
test_houston_mini_fixture_integration |
8×8 grid, weighted prior from 2 mock priors, 3 mock gauges (one in conflict). End-to-end: prior → bayesian_update → exceedance_surface ×3 → disagreement_surface → evidence block. Asserts shapes, exceedance ordering, and the conflict cell shows elevated disagreement. The fixture-shaped acceptance test (data inline; no network, no themis dependency). |
All tests CPU-only, pure-Python, deterministic; no fixtures fetched.
9. Acceptance criteria
- [ ]
SensorObservationandPosteriorSurfacefrozen dataclasses added tolens/models/artifacts.py - [ ]
bayesian_update(prior, observations, bbox, kernel) -> PosteriorSurfaceimplemented, closed-form, pure - [ ]
exceedance_surface(posterior, threshold) -> np.ndarrayvia libm erfc; thresholds{0.5, 1.0, 2.0}exercised - [ ]
disagreement_surface(posterior, prior) -> np.ndarrayimplemented - [ ] Gaussian (default) and inverse-distance kernels implemented via
KernelParams - [ ]
explain_cellper-cell decomposition accessor implemented - [ ] Posterior surfaces hash deterministically through
create_evidence_block(canonical observation ordering) - [ ] Parser accepts
composition.method: bayesian_update; orchestrator routes to this engine; absent field defaults toweighted_composite(backward compatible) - [ ]
pytest tests/test_bayesian.pygreen — all tests in §8
10. What themis measures
The Houston-federated fixture (themis, external) instruments this engine against FED-SPEC-04 §2.B (Cost-Lens Performance) and the audit plan's benchmarks:
- 04.B.1 Surface continuity — posterior mean/variance grids are C0-continuous; no spurious discontinuities at sensor locations or kernel edges.
- 04.B.2 Weight sensitivity —
∂(posterior)/∂(length_scale, prior_sigma_floor, σ)bounded; small input perturbation → small output change. - 04.B.5 / 04.B.6 Latency / throughput — raster-heavy compute; single-update p50/p99 and decisions/sec on TRM-1.
- Replay determinism (FED-SPEC-02, audit §C.1) — the named pre-flight: run
bayesian_updateon two Macs with identical inputs; assert bitwise-identical grids before the fixture claims replay determinism. This module is flagged as the highest-FP-risk surface, so this gate is mandatory. - Provenance completeness (audit §C.2) — the evidence block reconstructs the posterior from its inputs (priors + canonical observations + kernel), no dangling references across the three federates.
- Posterior + disagreement output type (audit §B.2) — the fixture's headline output:
P(depth ≥ X)at{0.5m, 1m, 2m}plus the per-cell disagreement raster, both produced by this engine.
11. Open questions (with defaults)
- Full GP / kriging vs. per-pair kernel. Default: per-pair Gaussian distance-decay kernel (§2/§7). Adopt full GP regression (global covariance, joint posterior) only if a fixture demonstrably needs joint cross-cell posterior covariance and the FP-determinism pre-flight passes for the matrix-inverse path. The per-pair kernel is the standing, audit-class-explainable default.
- Non-Gaussian likelihoods. Some sensors (binary "flooded / not flooded" citizen reports, censored gauge readings at rail limits) are not Gaussian. Default: Gaussian only for V0; model low-trust/binary sources by inflating
σ(down-weighting) rather than changing the likelihood family. Beta/Bernoulli likelihoods are a future extension behind alikelihood_familyfield — they break the conjugate closed form and need a determinism story first. - Temporal decay of observations. A gauge reading from 3 hours ago is weaker evidence than one from 5 minutes ago. Default: no temporal decay in V0 —
timestampis carried onSensorObservation(advisory) but does not scale precision. When added, decay scalesσ(or the kernel) by an explicit half-life parameter so it stays in the closed form and remains deterministic; this is the natural tie-in to SPEC-19 incremental recomposition (a stale observation's footprint shrinks over time). - Prior variance source. Default: uniform
prior_sigma_flooracross cells. A future refinement derives per-cell prior variance from the inter-prior spread (the variance of NWM/HEC-RAS/SLOSH at each cell), making the prior less certain exactly where the forecasts already disagree — which would also feed the disagreement surface. Deferred to keep V0's prior simple and hand-verifiable.
12. Cross-references
| Document | Relationship |
|---|---|
| SPEC-08 (axonis-lens) | Threat Engine — compute_risk_surface builds the weighted prior surface this engine conditions; shares the grid/cell + bbox model. |
| SPEC-04 (axonis-lens) | Evidence Model — posterior surfaces are frozen evidence blocks via create_evidence_block; determinism rules in §4. |
| SPEC-02 (axonis-lens) | Lens Parser — extended with composition.method: bayesian_update (§5). |
| SPEC-19 (axonis-lens) | Incremental Updates — the streaming case of this engine: a gauge update re-runs bayesian_update over the affected footprint; threshold crossings on the posterior become signals. Temporal-decay open question ties in here. |
HOUSTON-FEDERATED-V0-AUDIT-PLAN.md (themis) |
§B.2 names this gap; §E.1 scopes the module (~2–3 weeks, "analytically tractable", audit-class explainable); §C.1 the FP-determinism pre-flight. |
HOUSTON-SOURCES-AND-SPEC.md (themis) |
Sources table (priors vs. observations vs. truth) + the houston_flood_composition_v1 lens sketch this spec implements. |
HOUSTON-PLAIN-LANGUAGE.md (themis) |
The decision-maker requirement: probability of exceedance + disagreement, in plain terms. |
| FED-SPEC-04 §2.B (themis) | Cost-Lens performance benchmarks this engine is measured against (§10). |
| FED-SPEC-02 (themis) | Replay determinism — this engine is the highest-FP-risk surface; §4 + §10 define the mitigation and pre-flight gate. |
Owner: CPO. Houston-federated fixture (themis, external) is the driving initiative. Next review when the single-cell conjugate test and the Houston mini-fixture integration test land green.
Depends on: component.prism.evidence-model, component.prism.packaging, component.prism.threat-engine, component.prism.universal-lens-parser
Realizes: product.lens