Sentinel — Alerting and Monitoring Service

Status: Implemented — sentinel/ repo active with FastAPI server, MCP mount, and alerting domain objects (AlertEvent, Notification, Sensor, Subscriber) migrated from rest's userspace. Package: sentinel (Python pkg server/) Depends on: platform.axonis-core, platform.service-contract Milestone: P2 (after axonis-core is published)

Repo-local spec. Cross-cutting service mechanics (auth, /health + /service-info, Helm chart, CI/CD, uv/hatchling packaging, ruff) are not covered here — they follow platform.service-contract and its Cross-Cutting Requirements. This spec covers what makes the alerting service unique.

Who & why

Trigger: As an operator monitoring federated sites, I want a dedicated alerting service that ingests alert events from any source, evaluates thresholds, and routes notifications, so that the alert lifecycle (acknowledge → resolve → escalate) is managed in one place with domain-specific tools instead of generic CRUD.

Current pain. Alerting objects (AlertEvent, Notification, Sensor, Subscriber) lived in the alerts Elasticsearch index and were reached only through fedai-rest's generic userspace CRUD. There was no alert-specific workflow (no acknowledge/resolve/escalate), no threshold evaluation, and no uniform way to record where an alert came from — so cross-source alert handling and routing had to be reimplemented per caller.

Job to be done. Own the full alert lifecycle across heterogeneous sources (sensors, rules engines, ML models, humans, external systems) under one source-identity model, with first-class workflow tools and notification routing.

Out of scope. Raw data ingestion (Conduit, component.conduit.service); device writes back to sensors (Relay, platform.relay); and all cross-cutting service scaffolding (auth middleware, health/service-info contract, Helm, CI/CD, packaging) which platform.service-contract already mandates for every service.

Purpose

Sentinel is the alerting and monitoring microservice. It owns all alert lifecycle management: event ingestion, threshold evaluation, notification dispatch, sensor management, and subscriber routing.

Previously, alerting objects (AlertEvent, Notification, Sensor, Subscriber) were stored in the alerts Elasticsearch index and accessed via fedai-rest's generic userspace CRUD. Sentinel extracts these into a dedicated service with domain-specific tools for alert management workflows.

Scope: generic alerting, sensors as one source type

Sentinel is a generic alerting service. Alerts may originate from many kinds of sources, of which sensors are one common case. Other valid sources include:

Rules engines (transaction-monitoring rule trips)
ML models emitting an anomaly score
Scheduled audit jobs finding violations
Humans manually escalating events
External systems posting alerts via the API

Every AlertEvent carries a source identity consisting of:

source_type — one of sensor | rule | model | human | external
source_uid — the UID of the specific origin within that type (e.g. the sensor's UID when source_type=sensor)

Source-specific resources (e.g. alerts://sensors/{sensor_id}/history) filter alerts where source_type matches the resource scheme and source_uid matches the parameter. Other source types may add their own resources without changing the core alert model.

Sensor objects remain the registry of configured sensor sources (their thresholds, sites, types). Non-sensor sources are not registered in the Sensor index — they identify themselves only at alert-creation time via source_type + source_uid.

Domain Objects (from axonis-core userspace)

Object	Schema constant	ES index	Purpose
AlertEvent	`Schema.ALERT_EVENT`	alerts	Triggered alert with severity, site, status; identifies its origin via `source_type` + `source_uid`
Notification	`Schema.NOTIFICATION`	alerts	Dispatched notification record
Sensor	`Schema.SENSOR`	alerts	One kind of alert source — registers a configured sensor (threshold, type, site). Non-sensor sources self-identify at alert-create time and are not registered here.
Subscriber	`Schema.SUBSCRIBER`	alerts	Alert routing target (user, channel, webhook)
AlertSubscriber	`Schema.ALERT_SUBSCRIBER`	alerts	Subscription binding (source → subscriber)
AlertThreshold	`Schema.THRESHOLD`	alerts	Configurable threshold definition

Code Structure

server/
  __init__.py
  __main__.py                  # Starlette: /agentspace, /api/v1, /health, /service-info
  api/
    __init__.py
    routes.py                  # FastAPI REST endpoints
    schema/
      alerting_objects.yml     # OpenAPI component schemas
  mcp/
    __init__.py
    server.py                  # FastMCP tools + resources
    commands.py                # Command layer (shared by REST + MCP)

Port and API Version

Port: 8005 API version: /api/v1

MCP Tools (18)

CRUD (12)

Tool	Purpose
`alert_list(summary, status, severity, site_id)`	List alerts with optional filters
`alert_get(uid)`	Get alert by UID
`alert_create(body)`	Create alert event
`alert_update(uid, body)`	Update alert (e.g., change status)
`sensor_list(summary, sensor_type)`	List sensors with optional type filter
`sensor_get(uid)`	Get sensor by UID
`sensor_create(body)`	Create sensor definition
`sensor_update(uid, body)`	Update sensor
`subscriber_list(summary)`	List subscribers
`subscriber_get(uid)`	Get subscriber by UID
`subscriber_create(body)`	Create subscriber
`notification_list(summary, alert_id)`	List notifications, optionally by alert

Workflow (4)

Tool	Purpose
`alert_acknowledge(uid, acknowledged_by, note)`	Acknowledge an alert — sets status to ACKNOWLEDGED
`alert_resolve(uid, resolved_by, resolution)`	Resolve an alert — sets status to RESOLVED
`alert_escalate(uid, escalate_to, reason)`	Escalate an alert — creates notification to escalation target
`alert_evaluate(sensor_id, value)`	Evaluate a value against a sensor's threshold — returns whether alert should trigger

Introspection (2)

Tool	Purpose
`alert_summary(time_range, group_by)`	Summary of alerts by severity/site/status over a time range
`sensor_status(sensor_id)`	Current state of a sensor: last triggered, alert count, health

MCP Resources (2)

URI	Purpose
`alerts://active`	List of currently active (unresolved) alerts
`alerts://sensors/{sensor_id}/history`	Recent alert history for a sensor

REST Endpoints

CRUD

Method	Path	Maps to MCP tool
GET	`/api/v1/alert`	alert_list
GET	`/api/v1/alert/{uid}`	alert_get
POST	`/api/v1/alert`	alert_create
POST	`/api/v1/alert/{uid}`	alert_update
GET	`/api/v1/sensor`	sensor_list
GET	`/api/v1/sensor/{uid}`	sensor_get
POST	`/api/v1/sensor`	sensor_create
POST	`/api/v1/sensor/{uid}`	sensor_update
GET	`/api/v1/subscriber`	subscriber_list
GET	`/api/v1/subscriber/{uid}`	subscriber_get
POST	`/api/v1/subscriber`	subscriber_create
GET	`/api/v1/notification`	notification_list

Workflow

Method	Path	Maps to MCP tool
POST	`/api/v1/alert/{uid}/acknowledge`	alert_acknowledge
POST	`/api/v1/alert/{uid}/resolve`	alert_resolve
POST	`/api/v1/alert/{uid}/escalate`	alert_escalate
POST	`/api/v1/sensor/{uid}/evaluate`	alert_evaluate

Introspection

Method	Path	Maps to MCP tool
GET	`/api/v1/alert/summary`	alert_summary
GET	`/api/v1/sensor/{uid}/status`	sensor_status

Service Info

{
  "name": "sentinel",
  "version": "1.0.0",
  "description": "Alerting and monitoring — event lifecycle, sensors, notifications",
  "mcp_path": "/agentspace",
  "health_path": "/health",
  "api_path": "/api/v1",
  "tools_count": 18,
  "resources_count": 2,
  "capabilities": ["alert", "sensor", "subscriber", "notification", "threshold"]
}

Command Layer

# server/mcp/commands.py

from axonis.userspace.alerting import AlertEvent, AlertThreshold, Subscriber, Notification
from axonis.userspace.intelligence import Memory

memory = Memory()

def acknowledge_alert(uid, acknowledged_by, note=""):
    alert = AlertEvent().read(uid=uid)
    update = {"status": "ACKNOWLEDGED", "acknowledged_by": acknowledged_by, "note": note}
    AlertEvent().update(update, uid)
    memory.create({
        "content": f"Acknowledged by {acknowledged_by}: {note}",
        "memory_type": "alert_ack",
        "source_conversation_id": uid,
    })
    return {**alert, **update}

def evaluate_threshold(threshold_id, value):
    threshold = AlertThreshold().read(uid=threshold_id)
    operator = threshold.get("operator", "gt")
    limit = threshold.get("value", 0)
    triggered = (
        (operator == "gt" and value > limit) or
        (operator == "lt" and value < limit) or
        (operator == "eq" and value == limit) or
        (operator == "gte" and value >= limit) or
        (operator == "lte" and value <= limit)
    )
    return {"threshold_id": threshold_id, "value": value, "threshold": limit,
            "operator": operator, "triggered": triggered}

Alert Filters

The alert_list tool supports the following filters:

sensor_type — filter by sensor type (only meaningful when source_type=sensor)
severity — filter by severity level
site_id — filter by site/location
status — filter by alert status (ACTIVE, ACKNOWLEDGED, RESOLVED, ESCALATED)
source_type — filter by origin kind (sensor | rule | model | human | external)
source_uid — filter by specific origin UID

Filters are passed as query parameters in REST and as tool arguments in MCP. The first four are pushed down to ES via Schema.ALERT_FILTERS; source_type and source_uid are applied in Python until axonis-core's ALERT_FILTERS set is extended (tracked separately).

Alert Lifecycle Pipeline

Beyond the request-scoped CRUD/workflow tools above, Sentinel owns an end-to-end evaluation pipeline that turns raw source readings into notifications and signals. The stages are source-agnostic; sensors are the canonical case but the same flow applies to any source_type.

Trigger flow

Source reading (sensor poll, rule trip, model score, external POST)
    |
    | 1. Normalize to standard reading schema, persist
    v
Threshold evaluation
    | 1. Resolve effective threshold (per-source override, else default)
    | 2. Evaluate reading vs threshold (alert_evaluate semantics)
    | 3. Check cooldown (see #cooldown) — skip if within min report interval
    | 4. If exceeded AND not in cooldown:
    |    a. Query matching subscribers
    |    b. Filter by min_severity and quiet hours
    |    c. Dispatch notifications (per channel)
    |    d. Write AlertEvent (status=triggered) — also acts as cooldown marker
    |    e. Write Notification record(s)
    |    f. Emit Signal to cortex via signal_create (see #signal-integration)
    v
Subscriber notified  +  ADI shows Signal in Monitor

Clear flow

When readings return below threshold, the evaluator transitions the matching AlertEvent (sets cleared_at and status) and emits a resolved Signal to cortex (signal_create). Per #invariants, the AlertEvent is status-transitioned, never deleted.

Threshold Configuration

Thresholds resolve from two layers, override taking precedence:

Defaults — base threshold definitions per source/sensor type (platform-level configuration). New source types are onboarded by adding a section here; no code change required.
Per-source overrides — stored as AlertThreshold records (Schema.THRESHOLD, alerts index) keyed by source/site; take precedence over defaults.

Resolution order: per-source override checked first, then the type default.

Evaluation modes

A threshold evaluates in one of two modes:

Fixed value — the threshold's value field holds the comparison number (used by evaluate_threshold / alert_evaluate).
Compare field — compare_field references another field in the source reading; the reading's value is compared against that field rather than a constant.

Severity levels

Level	Priority	Description
`warning`	Low	Initial threshold breach
`high`	Medium	Elevated concern, rapid changes
`critical`	High	Severe threshold breach
`extreme`	Highest	Emergency condition

Cooldown Logic

A minimum report interval prevents notification storms when a reading oscillates near a threshold boundary. Cooldown is state-derived from the alerts index (no external store): before dispatching, query for the most recent AlertEvent matching (source_type, source_uid, threshold_name) with triggered_at >= now - interval (size=1, sorted triggered_at desc).

Found → within cooldown → skip (no notification, no new AlertEvent).
Not found → dispatch; the written AlertEvent becomes the cooldown marker for the next evaluation.

The interval is sourced from the effective threshold's min_report_interval_sec (per-source override first, then default — same resolution order as #threshold-config).

Subscriber Routing

Subscribers carry one or more subscriptions and delivery preferences that the pipeline filters against at dispatch time:

subscriptions[] — each binds a source_type/sensor_type, a list of site_ids ("*" = all), a min_severity floor, and an active flag. A subscriber is matched when an active subscription's type and site match the alert and the alert severity ≥ min_severity.
notification_channels — ordered channels (e.g. sms); each produces a Notification record.
quiet_hours — optional {enabled, start, end} window in the subscriber's timezone; alerts falling inside the window are suppressed for that subscriber.

Each dispatched channel produces a Notification record (Schema.NOTIFICATION, immutable per #invariants) capturing channel, destination, provider, provider message id, status, and timestamps.

Signal Integration (ADI)

Alerting and the ADI signal surface serve different audiences and are intentionally distinct:

Concern	Alerting	Signal (ADI)
Audience	Subscribers (phone/channel targets)	People with roles (accountability packs)
Action	Notify immediately	Investigate, decide, attest over hours/days
Output	Notification delivered	Auditable decision record with evidence
Storage	`alerts` index	`intelligence` index
Question	"Did we notify?"	"Did we respond properly?"

Sentinel manages the notification infrastructure; the signal it emits feeds the accountability record consumed by the ADI investigation workflow (Cortex/Beacon).

Dual-path signal ingestion

Signals reach the intelligence index by two complementary, first-class paths producing identical Signal v2 documents:

Path 1 — pipeline-emitted. As a side effect of threshold evaluation + notification, Sentinel maps the AlertEvent to a Signal v2 document and emits it to cortex (the owning service) via cortex's signal_create surface — cortex validates it against the signal governance rules (severity/dedup/status) and persists it to the intelligence index. Sentinel never writes the intelligence index directly; routing through cortex ensures the governance ceremony is applied rather than bypassed by a raw index write. The AlertEvent cross-references the signal_id cortex returns. Source: platform-evaluated thresholds.
Path 2 — direct push. External systems POST a Signal directly (e.g. PUT /userspace/signal/{signal_id}) without going through the alerting pipeline; they own their threshold/state logic. Source: source-evaluated conditions (webhooks, polling jobs, correlation engines).

Both feed the same ADI accountability flow: Cortex loads the user's accountability pack, filters by signal_type and severity, surfaces the signal in Beacon's role-filtered Signal Queue, where a user opens an Investigation, pins evidence, selects a decision template, and a reviewer attests (separation of duties) before the edition is frozen and tasks dispatched.

Alert → Signal mapping

When the pipeline (Path 1) converts an AlertEvent to a Signal:

Severity:

Alert severity	Signal severity
`warning`	`medium`
`high`	`high`
`critical`	`critical`
`extreme`	`critical`

Status:

Alert status	Signal status
`triggered`	`new`
`cleared`	`resolved`
`acknowledged`	`acknowledged`

Fields:

Alert field	Signal field
`source_type`/sensor_type (e.g. `water_level`)	`signal_type` (e.g. `sensor_water_level`)
`source_uid` / `site_id`	`subject.id`
`"sensor"` (literal, when source is a sensor)	`subject.type`
`location`	`subject.name`
`message`	`description`
`triggered_at`	`detected_at`

The mapped Signal is submitted to cortex's signal_create, which validates and persists it to the intelligence index with subtype: signal; Sentinel records the returned signal_id on the AlertEvent.

Object Schemas

Reference field shapes for the alerting objects (the alerts index, see #domain-objects). Cross-cutting envelope fields (uid, timestamps, visibility) follow platform.axonis-core.

AlertEvent

{
  "alert_id": "alert_xyz789",
  "source_type": "sensor",
  "source_uid": "hcfcd_001",
  "sensor_type": "water_level",
  "site_id": "hcfcd_001",
  "threshold_name": "minor_flood",
  "severity": "high",
  "status": "triggered",
  "current_value": 25.3,
  "threshold_value": 24.0,
  "field": "stream_level_current_ft",
  "message": "Minor flooding at hcfcd_001: 25.3 ft (threshold: 24.0 ft)",
  "signal_id": "abc123-def456",
  "triggered_at": "2026-03-01T14:30:00Z",
  "cleared_at": null,
  "notifications_sent": 3,
  "notifications_failed": 0
}

signal_id cross-references the emitted Signal in the intelligence index (see #signal-integration).

Subscriber

{
  "subscriber_id": "sub_abc123",
  "name": "Jane Doe",
  "email": "jane@example.com",
  "phone": "+15551234567",
  "notification_channels": ["sms"],
  "subscriptions": [
    {"sensor_type": "water_level", "site_ids": ["hcfcd_001", "*"], "min_severity": "warning", "active": true}
  ],
  "timezone": "America/Chicago",
  "quiet_hours": {"enabled": false, "start": "22:00", "end": "06:00"},
  "active": true
}

AlertThreshold

{
  "site_id": "hcfcd_001",
  "sensor_type": "water_level",
  "threshold_name": "minor_flood",
  "override_value": 26.0,
  "min_report_interval_sec": 900,
  "enabled": true,
  "created_by": "admin"
}

min_report_interval_sec drives the cooldown window (see #cooldown).

Notification

{
  "notification_id": "notif_def456",
  "alert_id": "alert_xyz789",
  "subscriber_id": "sub_abc123",
  "channel": "sms",
  "destination": "+15551234567",
  "status": "sent",
  "provider": "twilio",
  "provider_message_id": "SM1234567890",
  "message_body": "Minor flooding at hcfcd_001...",
  "sent_at": "2026-03-01T14:30:05Z",
  "delivered_at": "2026-03-01T14:30:08Z",
  "retry_count": 0,
  "error": null
}

Memory Namespace

sentinel — stores alert acknowledgments, resolutions, escalation history.

Migration from fedai-rest

Alert objects currently live in fedai-rest's generic userspace CRUD
Sentinel takes ownership of the alerts ES index
fedai-rest removes alerting targets from its USERSPACE dict
Oracle gateway routes alert tools to sentinel instead of fedai-rest
axonis-client routing maps alerts index → sentinel service URL

Invariants

Alerts are append-only. AlertEvent records are never deleted, only status-transitioned.
Acknowledge/resolve require a user identity. The acknowledged_by / resolved_by fields are mandatory and come from the auth token.
Threshold evaluation is pure. alert_evaluate computes whether a threshold is exceeded but does NOT create an alert. The caller decides whether to act on the result.
Notifications are immutable. Once created, notification records cannot be modified.
Cooldown is state-derived. The cooldown decision is computed from the alerts index alone (most recent matching AlertEvent within the interval); no separate cooldown store exists.

Test Expectations

CRUD roundtrip tests for all 6 object types
Acknowledge/resolve workflow tests
Threshold evaluation tests (all operators)
Alert filtering tests (by severity, status, site, sensor_type)
Summary aggregation tests
Auth tests (token required for all endpoints)

Depends on: platform.axonis-core, platform.service-contract