Skip to content

Cortex — Data Shape Contract

Eliminate frontend guessing by making Cortex emit stable, render-ready data shapes while keeping (1) ABAC safety, (2) dataspace types sacred, and (3) visualization choice in Beacon's ViewerController. Cortex normalizes polymorphic Elasticsearch aggregation responses into canonical data shapes; Beacon renders from those, never from raw ES JSON.

Vision (Non-negotiables)

Blocks Are Evidence, Not Views

A Block is an immutable-ish evidence object produced by Cortex and rendered by Beacon. Blocks carry provenance and enough shape metadata to render deterministically. Blocks MUST NOT encode "viz type" (no "timeseries chart", "heatmap chart") — Beacon chooses how to render.

Deterministic Shape, Emitted by Cortex

ES aggregation responses are polymorphic. Beacon must not reverse-engineer semantics from raw ES JSON. Cortex normalizes results into canonical data shapes and (optionally) derived projections. Raw ES responses may be included for audit/debug, but viewers should not depend on them.

Dataspace Types Are Sacred

Do not mutate uds_type or inferred domain types inside describe_model. Use render_hints and column_roles as secondary, non-authoritative metadata only.

ABAC Safety

Logs and shape metadata MUST NOT leak ABAC-hidden field names. If a user cannot see a field, blocks must not reveal its name or existence. Shape fingerprints are computed from operator structure, not field names.

Goals

Make aggregation rendering deterministic across: terms + metrics; date_histogram + terms + metrics; filters (multi-window temporal buckets); geohash_grid / geotile_grid + metrics; composite aggs with after_key pagination; pipeline metrics (bucket_script / derivatives). Make new shape variants visible during testing via a stable shape signature + fingerprint and first-seen logging. Keep the UI simple: Beacon renders from canonical projections (bucket_rows, geo_cells, etc.); ViewerController chooses the best viewer from column_roles/hints and projection availability.

Block Payload Contract

Common Block Metadata (required)

id, ts; data_source_type (e.g. elasticsearch); model, source; block_kind (one of query_result | artifact_evidence | ai_summary | kpi | schema); evidence_class (granular sub-type for routing: query | aggregation | geo_agg | timeseries | multi_agg | kpi | schema | histogram | model_card | dataset_card); materialization_mode (live | frozen, replaces storage_mode); lifecycle_stage (transient | curated | frozen); origin_surface (explore | monitor); source_tool, query_hash; optional query_group_id (group sibling blocks from one user request), sibling_index / sibling_count (multi-block responses).

Outcome Envelope (required)

outcome (OK | NO_DATA | PARTIAL | ERROR | POLICY_DENIED); warnings[] (strings, no ABAC leaks); errors[] (no secrets, safe summaries); optional federation: { federates_ok, federates_total }.

Shape ID (required for aggregation/geo_agg)

shape_signature (human-readable, no field names); shape_fingerprint (sha256 of canonical signature JSON); shape_features[] (e.g. ["filters_named_keys", "composite_after_key", "pipeline_metrics"]).

Column Metadata

column_meta maps output column keys to roles and hints (optional but strongly recommended): role (dimension | metric | time | geo | id); optional unit; optional render_hint (date_like_iso_string | wkt_point_string | rate | magnitude | categorical | ordinal); optional order (explicit ordering, e.g. ["w15m","w30m","w60m","w6h","w24h"]).

column_meta is advisory; it MUST never contradict ABAC or overwrite uds_type.

Visualization Hints

Cortex emits viz_hints to guide Beacon's ViewerController. These are recommendations, not commands.

viz_hints.recommended (required)

Value block_kind Beacon Renderer Description
table query, aggregation renderTable Default tabular view
kpi_card kpi renderKPI Key metric cards
bar_chart aggregation, histogram renderBar Vertical bar chart
timeseries_chart timeseries renderTimeseries Line chart with time axis
geo_grid geo_agg renderGeoGrid Grid cells with color scale + legend
geo_point query (with geo) renderGeo Point markers on map
geo_heat geo_agg renderGeoHeatmap Heat intensity layer
tabbed_view multi_agg renderTabbed Tabbed view for multiple aggs
schema schema renderSchema Field/type table

viz_hints.alternatives (optional): array of alternative viz types the user can switch to, e.g. ["table", "bar_chart"].

MCP Response Envelope

All MCP tools that return data MUST use this envelope. Beacon reads from structuredContent directly.

{
  "success": true,
  "block_kind": "query_result | artifact_evidence | ai_summary | kpi | schema",
  "evidence_class": "query | aggregation | geo_agg | timeseries | ...",
  "projections": {
    "rows": [],        "columns": [],
    "bucket_rows": {}, "geo_cells": {}
  },
  "column_meta": [ {"field": "sensor.site", "label": "Site", "role": "dimension", "type": "string"} ],
  "viz_hints": { "recommended": "table", "alternatives": ["bar_chart"] },
  "block": {}
}

Key rule: Beacon reads structuredContent.projections.rows and structuredContent.column_meta — these MUST be at the TOP level of the MCP response, not nested inside block. block carries the full shape-contract-compliant block for storage/curation.

Canonical Data Shapes

query (rows)

For "hits"-style queries. Required: projections.rows (data rows), projections.columns (visible fields only). Optional: column_meta, total_hits, sampling.

aggregation

Used when ES returns aggregations. Required: aggregation_tree (raw ES agg response, safe subset or full per policy); projections (at least one of the below, preferably bucket_rows whenever buckets exist).

  • projections.bucket_rows (preferred default) — a flattened, stable fact table: dimensions: [string], metrics: [string], rows: [object] (each row has all dims + metrics). Rules: always include doc_count if buckets exist; dimension keys use stable names chosen by Cortex (not ABAC-sensitive raw field names); missing metric → null (preferred) or 0; for filters agg include a window dimension using the filter keys.
  • projections.series_rows (optional) — when the aggregation naturally represents series: x_key, optional series_key, metrics, rows.
  • projections.pivot (optional) — a pivoted/wide table: row_key, column_key, value_key, rows.

geo_agg

For geohash_grid / geotile_grid. Required: geo_cells (rows describing grid cells with location + metrics). Minimum geo_cells shape: cell_id (geohash or geotile zoom/x/y), metrics: {doc_count, ...}. Optional: centroid_lat/centroid_lon (only if ES provides geo_centroid), bounds. Beacon's ViewerController decodes cell_id natively (_decodeCellId() handles both geohash and geotile); Cortex does NOT pre-compute centroids from cell_id. Optional: join_keys, projections.bucket_rows if geo is one dimension among others.

timeseries

For date_histogram aggregations. Required: projections.series_rows with x_key (time dimension), metrics, rows; also projections.bucket_rows for table fallback. Optional: series_key for stacked/grouped series.

multi_agg

When a query contains multiple top-level aggregations or deeply nested bucket aggregations. Required: projections.bucket_rows (flattened fact table with all dimensions and metrics); agg_blocks (per-aggregation structured data for separate rendering). Recommended viz_hint: tabbed_view; falls back to table.

Hard Shapes To Support

  • Multi-window temporal filters per site (terms → filters): compute metrics across multiple lookback windows per site. Pattern: terms(site_id) → filters(window keys) → metric(s). Projection: bucket_rows dims ["site_id","window"]; window order w15m, w30m, w60m, w6h, w24h (explicit in column_meta.window.order).
  • Geohash/grid aggregation: geo_grid(cell) → metric(s), optionally nested with terms(category). Projection: geo_cells primary, bucket_rows secondary for multi-dim analysis.

Shape Fingerprinting and Logging

Shape signature rules: do NOT include field names unless confirmed visible & safe. Include: bucket operator sequence (terms/date_histogram/filters/geotile_grid/composite/nested); bucket depth; count of filter keys; presence of key_as_string, after_key, pipeline metrics; metric operator types present (sum, avg, max, percentiles, bucket_script). Emit structured logs: NEW_AGG_SHAPE (WARN) when a fingerprint is first seen, AGG_SHAPE (DEBUG/INFO) when known, including fingerprint, signature, tool, model, query_hash, query_group_id. Optionally persist fingerprint counts to the insights index.

Viewer Contract (Beacon)

Beacon renders using projections first: prefer projections.bucket_rows or geo_cells; raw aggregation_tree is debug-only. ViewerController selection considers block_kind, presence of projections, column_meta roles/hints, and shape_features (e.g. composite pagination). On viewer failure: log viewer_id + shape_fingerprint + missing_requirements (e.g. bucket_rows missing).

Acceptance Criteria

  1. Any aggregation block with buckets MUST include projections.bucket_rows (unless outcome ∉ {OK, NO_DATA}).
  2. Geo aggregations MUST include geo_cells.
  3. No viewer should parse raw ES aggregation JSON in the normal render path.
  4. New agg shapes must be discoverable via NEW_AGG_SHAPE logs.
  5. No ABAC-hidden field names appear in logs or block metadata.

Open Decisions

Missing metric values: null vs 0 (recommend null). Bucket truncation/topN: include warnings[] when truncated. Composite pagination: include after_key in raw + optionally in shape_features. Federation partials: outcome=PARTIAL + warnings.


Depends on: component.cortex.block-card, component.cortex.intelligence

Realizes: product.block